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ABSTRACT         £^1^, 


The  quality  of  realism  in  virtual  environments  is  typically  considered  to  be  a 
function  of  visual  and  audio  fidelity  mutually  exclusive  of  each  other.  However,  the 
virtual  environment  participant,  being  human,  is  multi-modal  by  nature.  Therefore,  in 
order  to  more  accurately  validate  the  levels  of  auditory  and  visual  fidelity  required  in  a 
virtual  environment,  a  better  understanding  is  needed  of  the  intersensory  or  cross-modal 
effects  between  the  auditory  and  visual  sense  modalities. 

To  identify  whether  any  pertinent  auditory-visual  cross-modal  perception 
phenomena  exist,  108  subjects  participated  in  three  main  experiments  which  were 
completely  automated  using  HTML,  Java,  and  JavaScript  computer  programming 
languages.  Visual  and  auditory  display  quality  perception  were  measured  intramodally 
and  intermodally  by  manipulating  visual  display  pixel  resolution  and  Gaussian  white 
noise  level  and  by  manipulating  auditory  display  sampling  frequency  and  Gaussian  white 
noise  level. 

Statistically  significant  results  indicate  that  1)  medium  or  high-quality  auditory 
displays  coupled  with  high-quality  visual  displays  increase  the  quality  perception  of  the 
visual  displays  relative  to  the  evaluation  of  the  visual  display  alone,  and  2)  low-quality 
auditory  displays  coupled  with  high-quality  visual  displays  decrease  the  quality 
perception  of  the  auditory  displays  relative  to  the  evaluation  of  the  auditory  display  alone. 
These  findings  strongly  suggest  that  the  quality  of  realism  in  virtual  environments  must 
be  a  function  of  both  auditory  and  visual  display  fidelities  inclusive  of  each  other. 
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I.  INTRODUCTION 

A.  MOTIVATION 

The  fidelity  requirements  for  virtual  environments  have  traditionally  focused  on 
the  singular  modality  of  vision.  As  a  result,  in  an  attempt  to  render  visual  displays  as 
close  as  possible  to  the  fidelity  of  the  human  visual  system,  the  fidelity  of  visual  display 
systems  has  increased  dramatically  in  the  last  ten  years.  Likewise,  as  a  result  of  better 
audio  technology,  there  has  been  a  recent  surge  of  emphasis  on  the  fidelity  requirements 
concerning  the  singular  modality  of  audition.  As  a  result,  the  fidelity  of  auditory  display 
systems  has  increased  dramatically  in  the  last  five  years.  These  rapid  advances  in  visual 
and  auditory  display  technologies  have  helped  to  create  increasingly  realistic  virtual 
environments.  The  quality  of  realism  in  these  virtual  environments  is  typically  considered 
to  be  a  function  of  visual  and  audio  fidelity  mutually  exclusive  of  each  other  [BARF95]. 
Herein  lies  a  problem:  the  virtual  environment  participant,  being  human,  is  multi-modal 
by  nature.  Thus,  the  quality  of  realism  in  virtual  environments  needs  to  be  based  on 
multi-modal  criteria  comprising  all  of  our  senses,  as  opposed  to  the  current  use  of 
singular  modality  criteria.  As  such,  the  fidelity  requirement  of  virtual  environments  must 
be  based  on  multi-modal  criteria  comprising  all  of  our  senses.  However,  insufficient 
experimental  data  exists  to  make  informed  multi-modal  design  decisions. 

B.  OBJECTIVE 

Because  of  current  limitations  in  today's  computer  technology,  it  is  impossible  to 
render  realistic  information  to  all  of  our  senses  in  real-time  to  the  interactive  virtual 
environment  participant.  However,  since  there  have  been  significant  advances  in  visual 
and  audio  display  technology,  it  is  appropriate  to  concentrate  on  the  vision  and  audition 
sensory  modalities.  As  such,  the  objective  of  this  research  effort  correspondingly  focuses 
on  the  two  sensory  modalities  of  vision  and  audition.  In  particular,  the  objective  of  this 
effort  is  to  gain  a  better  understanding  of  the  intersensdry  or  cross-modal  effects  between 


the  auditory  and  visual  sense  modalities.  By  gaining  a  better  understanding  of  auditory- 
visual  cross-modal  effects,  system  designers  can  more  accurately  verify  and  validate  the 
levels  of  auditory  and  visual  fidelity  required  for  the  immersed  virtual  environment 
participant. 

C.       SCOPE 

Intersensory  phenomena  have  been  studied  for  many  years  by  researchers  in 
numerous  disciplines  such  as:  Psychoacoustics,  Psychology,  Physiology,  Neurology, 
Philosophy,  Musicology,  Ecology,  and  Computer-Human  Interaction,  and  by  different 
organizations  such  as:  Human  Factors,  Audio  Engineering  Society,  Acoustical  Society  of 
America,  Department  of  Defense,  Artistic  Community,  and  also  the  Film  and 
Entertainment  Industry.  Thus,  there  is  a  large  amount  of  intersensory  research,  but  this 
knowledge  is  often  kept  within  the  discipline  from  which  it  was  derived.  Consequently, 
there  is  little  cross-disciplinary  transfer  of  intersensory  knowledge.  This  lack  of  cross- 
disciplinary  knowledge  exists  not  only  with  intersensory  research,  but  also  seems  to 
extend  to  many  areas  of  academic  and  commercial  interests.  This  is  a  pity,  for  there  are 
no  doubt  countless  examples  of  redundant  research  efforts  all  because  of  a  lack  of  cross- 
disciplinary  knowledge  exchange.  Nevertheless,  in  terms  of  modeling  and  simulation,  the 
National  Research  Council  (NRC)  has  recently  investigated  the  possible  collaboration 
opportunities  between  the  Department  of  Defense  and  the  Entertainment  Industry 
[ZYDA97].  This  collaboration  is  a  much  needed  first  step  towards  better  cross- 
disciplinary  knowledge  transfer. 

Computer  Science  in  particular  is  severely  lacking  in  its  knowledge  and  use  of 
intersensory  phenomena.  Therefore,  it  is  important  to  note  that  the  scope  of  this  effort  is 
filtered  through  the  perspective  of  a  computer  scientist  for  use  by  other  computer 
scientists.  The  results  of  this  effort  are  intended  to  aid  the  computer  scientist  in 
developing  better  virtual  worlds  through  appropriate  use  of  auditory  and  visual  display 
fidelities  based  on  auditory-visual  cross-modal  perception  phenomena.  It  is  also 
important  to  note  that  the  scope  of  this  effort  is  not  to  identify  absolute  visual  and/or 


audio  fidelity  requirements  such  as  pixel  resolution  and  sampling  frequency  respectively, 
but  rather  to  identify  the  effects  of  auditory-visual  cross-modal  perception  phenomena 
which  can  be  used  to  justify  a  certain  level  of  audio  and/or  visual  fidelity. 

D.       APPROACH 

The  approach  taken  is  that  of  the  experimental  psychologist.  A  series  of 
experiments  were  designed  to  identify  if  there  exists  any  pertinent  auditory-visual  cross- 
modal  perception  interactions.  Specifically,  one  pilot  study  and  three  main  experiments 
were  conducted.  Each  of  the  three  main  experiments  was  completely  automated  using 
Hyper  Text  Markup  Language  (HTML),  Java,  and  JavaScript  [FLAN96]  [LADD98].  The 
pilot  study  was  also  completely  automated  but  was  developed  using  Virtual  Reality 
Modeling  Language  (VRML)  [HART96]  [LEAR96]  [ROEH97].  All  experiments  were 
conducted  at  the  Naval  Postgraduate  School  (NPS)  in  Monterey,  California.  A  total  of 
130  volunteer  participants  comprised  from  the  students,  faculty,  staff,  and  guests  of  NPS 
served  as  subjects.  Each  experiment  involved  a  3x3  factorial  within  subjects  design.  (See 
[GOOD95]  for  a  description  of  factorial  design  experiments.)  The  two  independent 
variables  were  visual  and  audio  display  quality  having  three  levels  each  consisting  of 
low,  medium,  and  high  qualities.  The  visual  display  parameters  that  were  manipulated 
were  pixel  resolution  and  Gaussian  white  noise  level.  The  audio  display  parameters  that 
were  manipulated  were  sampling  frequency  and  Gaussian  white  noise  level.  Partial 
counterbalancing  was  achieved  through  the  technique  of  balanced  Latin  squares.  (See 
[GOOD95]  for  a  description  of  the  Latin  squares  technique.)  The  basic  idea  of  the 
experiments  was  to  manipulate  visual  and  auditory  display  parameters  intra-modally  and 
inter-modally  and  to  likewise  measure  visual  and  auditory  display  perception  intra- 
modally  and  inter-modally.  During  the  experiments,  which  each  lasted  approximately  30 
minutes,  a  single  subject  wore  headphones  and  sat  in  front  of  a  20-inch  display  monitor. 
The  task  of  the  subject  was  to  rate  the  perceived  quality  of  audio-only,  visual-only,  and 
audio-visual  displays  through  Likert  rating  scales  ranging  from  1  to  7.  (See  [GOOD95] 
for  a  description  of  Likert  rating  scales.)  Thus,  the  dependent  variables  are  the  perception 


of  visual  display  quality  and  the  perception  of  auditory  display  quality.  It  is  hoped  that  by 
carefully  varying  the  fidelity  of  both  auditory  and  visual  displays,  it  will  be  possible  to 
measure  auditory-visual  cross-modal  perception  interactions.  Specifically,  this  effort  aims 
to  answer  the  following  question:  in  an  audio-visual  display,  what  affect  (if  any)  do 
various  audio  quality  levels  have  on  the  perception  of  visual  quality  and  vice  versa?  The 
following  are  some  examples: 

1)  Are  changes  in  the  audio  and/or  visual  qualities  of  an  audio-visual  display 
perceivable  and  can  these  changes  be  attended  to  also? 

2)  Does  a  high-quality  auditory  display  coupled  with  a  low-quality  visual  display 
cause  a  decrease/increase  in  the  perception  of  audio  quality  and/or  an  increase/decrease  in 
the  perception  of  visual  quality  relative  to  established  baseline  conditions  derived  from 
auditory-only  and  visual-only  quality  perception  evaluations? 

3)  Does  a  low-quality  auditory  display  coupled  with  a  high-quality  visual  display 
cause  an  increase/decrease  in  the  perception  of  audio  quality  and/or  a  decrease/increase  in 
the  perception  of  visual  quality  relative  to  established  baseline  conditions  derived  from 
auditory-only  and  visual-only  quality  perception  evaluations? 

4)  Does  a  low-quality  auditory  display  coupled  with  a  low-quality  visual  display 
cause  a  decrease/increase  in  the  perception  of  audio  quality  and/or  a  decrease/increase  in 
the  perception  of  visual  quality  relative  to  established  baseline  conditions  derived  from 
auditory-only  and  visual-only  quality  perception  evaluations? 

5)  Does  a  high-quality  auditory  display  coupled  with  a  high-quality  visual  display 
cause  an  increase/decrease  in  the  perception  of  audio  quality  and/or  an  increase/decrease 
in  the  perception  of  visual  quality  relative  to  established  baseline  conditions  derived  from 
auditory-only  and  visual-only  quality  perception  evaluations? 

E.        LIMITATIONS 

Another  facet  of  this  effort  was  to  confine  all  software  development  to  the  ever- 
evolving  internet  technology.  The  reasons  for  this  are  as  follows: 

1 )  To  easily  obtain  software.  All  the  software  used  to  execute  the  experiments  in 
this  effort  were  simply  downloaded.  This  downloaded  software  included:  Netscape  2.0, 
3.0,  and  4.0  [NETS98];  Sun's  Java  Development  Kit  (JDK)  1.0,  1.1.2,  1.1.4,  and  1.1.5 
[SUNM98];  Silicon  Graphics  Inc.  (SGI)  CosmoPlayer  VRML  2.0  beta  Netscape  Plugin 


and  VRML  2.0  Release  Netscape  Plugin  [COSM98];  Sony's  Community  Place  VRML 
2.0  Browser  [SONY98b],  and  Intervista's  WorldView  2.0  Browser  [INTE98]. 

2)  To  reduce  cost.  All  downloaded  software  was  free! 

3)  To  verify  the  feasibility  of  conducting  scientific  experiments  with  HTML/Java/ 
JavaScript/VRML. 

4)  To  support  seamless  portability  and  repeatability  of  research.  The  experiments 
outlined  in  this  dissertation  are  currently  being  set  up  to  be  repeated  at  the  College  of 
Computing  at  Georgia  Institute  of  Technology  in  Atlanta,  Georgia. 

5)  To  eventually  conduct  on-line  auditory-visual  cross-modal  experiments  which 
potentially  have  thousands  (if  not  millions)  of  subjects/trials. 

Another  chosen  limitation  was  that  of  hardware.  To  complement  the  ease  of 
access  and  portability  of  all  software,  all  the  hardware  used  in  this  effort  is  available  as 
commercial  off-the-shelf  (COTS)  products.  As  such,  no  specific,  hard  to  get,  or 
intractably  expensive  piece  of  hardware  is  needed  for  this  research  effort. 

F.        DISSERTATION  ORGANIZATION 

This  dissertation  is  organized  around  ten  chapters,  including  a  list  of  references,  a 
bibliography,  and  four  appendices.  Chapter  II  discusses  relevant  background  material 
including:  Perception,  The  Senses,  Audition,  Vision,  Attention,  Gestalt  Theory, 
Synesthesia,  and  Multimedia.  Chapter  III  presents  a  thorough  literature  review  covering: 
Virtual  Environments  (VE).  Auditory-Visual  Perceptual  Organization.  Auditory-Visual 
Art  Forms  and  Film,  Auditory-Visual  Cross-Modal  Matching,  Visual  Dominance  Over 
Audition,  Auditory-Visual  Threshold  Perception,  and  Auditory-Visual  Suprathreshold 
Perception.  Chapter  IV  discusses  the  issues  relevant  to  the  overall  development  of  the 
experimental  design  process  including:  Motivation,  Design  Considerations,  Design 
Selections,  and  Software  Design.  Chapter  V  discusses  Visual  Display  Development, 
Auditory  Display  Development,  and  Auditory-Visual  Display  Development.  Chapter  VI 
gives  a  complete  description  of  the  experimental  design  of  the  initial  pilot  study  to 
include:  Location,  Participants,  Apparatus,  Procedure,  Results  and  Discussion,  and 


Summary  and  Conclusions.  Chapter  VII  gives  a  complete  description  of  the  experimental 
design  involving  visual  display  pixel  resolution  manipulation  of  a  static  radio  image,  as 
well  as  auditory  display  sampling  frequency  manipulation  of  a  section  of  music 
including:  Location,  Participants.  Apparatus,  Procedure,  Changes  from  Pilot  Study,  Data 
Collection  and  Analysis,  Results  and  Discussion,  and  Summary  and  Conclusions. 
Chapter  VIII  gives  a  complete  description  of  the  experimental  design  involving  visual 
display  Gaussian  white  noise  level  manipulation  of  a  static  radio  image,  as  well  as 
auditory  display  Gaussian  white  noise  level  manipulation  of  a  section  of  music  including: 
Location,  Participants,  Apparatus,  Procedure,  Results  and  Discussion,  and  Summary  and 
Conclusions.  Chapter  IX  gives  a  complete  description  of  the  experimental  design 
involving  visual  display  pixel  resolution  manipulation  of  a  fruit-flower  scene,  as  well  as 
auditory  display  sampling  frequency  manipulation  of  a  section  of  music  including: 
Location,  Participants,  Apparatus,  Procedure,  Results  and  Discussion,  and  Summary  and 
Conclusions.  Chapter  X  presents  the  overall  findings  of  this  dissertation  to  include: 
Overall  Results,  Conclusions,  Impact,  Observations,  Recommendations,  Future  Work, 
and  Final  Thoughts. 


II.  BACKGROUND 

A.  INTRODUCTION 

The  intent  of  this  chapter  is  to  give  the  computer  scientist  a  high-level  overview 
of  some  of  the  basic  background  knowledge  which  is  required  in  order  to  understand  this 
multi-disciplinary  research  effort.  As  such,  the  information  outlined  in  this  chapter  is  by 
no  means  comprehensive.  Furthermore,  the  concepts  outlined  in  this  chapter  lay  the 
foundation  for  understanding  the  scope  of  this  research  effort.  Because  of  the  wide 
variety  of  topics  covered  including  Perception,  The  Senses,  Audition,  Vision,  Attention 
Theory,  Gestalt  Theory,  Synesthesia,  and  Multimedia,  the  reader  will  hopefully  gain  a 
better  appreciation  for  the  interdisciplinary  nature  and  breadth  of  knowledge  required 
when  conducting  intersensory  research. 

B.  PERCEPTION 

1.  Definition 

First  and  foremost  it  is  important  to  remember  that  "We  can  only  obtain  a  rather 

one-sided  idea  of  the  development  of  perception  if  we  neglect  the  interrelations  of  the 

different  senses  in  creating  our  perceptual  world"  [SCHL35].  With  this  in  mind  a  formal 

definition  of  perception  from  a  psychological  point  of  view  is  as  follows: 

The  psychology  of  perception,  then,  involves  the  study  of  the  way  an  observer  relates 
to  his  environment  —  the  way  in  which  information  is  gathered  and  interpreted  by  an 
observer.  This  relationship  is  the  result  of  a  continuing  process  of  learning,  judging, 
interpreting,  and  reacting  to  the  environment  which  begins  at  birth  and  continues 
throughout  the  life  span  of  the  individual.  [MURC73] 

From  a  physiological  perspective,  the  following  describes  the  nature  of  a  stimulus: 

An  excitation  originating  in  any  of  the  receptors  does  not  remain  strictly  localized,  but 
irradiates  to  some  extent  throughout  the  entire  nervous  system,  thus  affecting  the 
excitatory  states  of  all  other  mechanisms  and  consequently  the  sensory  responses  for 
which  such  excitatory  states  are  important  predisposing  factors.  [GILB41] 


2.  Stimulus 

A  stimulus  is  defined  as  "...any  chemical  or  physical  activator  which  causes  a 
response  in  a  receptor"  [FOST68].  In  total,  there  are  only  six  classes  of  stimuli:  (1) 
mechanical,  (2)  thermal,  (3)  photic,  (4)  acoustic,  (5)  chemical,  and  (6)  electrical. 
Furthermore,  an  effective  stimulus  is  one  that  produces  a  sensation,  the  dimensions  of 
which  are:  quality,  intensity,  extension,  duration,  and  like  and  dislike  [FOST68]. 

Murch  explains  that  the  term  stimulus  is  but  half  of  a  pair  of  correlated  terms,  the 
other  half  being  response.  As  such,  if  we  conform  strictly  to  this  correlated  definition  of 
stimulus,  a  circular  definition  enfolds.  "This  concept  of  stimulus  would  force  us  to  regard 
the  response  as  dependent  on  the  object  or  event  (stimulus)  and  the  stimulus  as  dependent 
on  the  response"  [MURC73].  Herman  von  Helmholtz  tried  to  avoid  this  circular 
definition  by  introducing  the  concepts  of  distal  stimulus  (the  external  object  or  event)  and 
proximal  stimulus  (the  sensory  representation  of  the  stimulus  by  the  nervous  system) 
[HELM66].  However,  Helmholtz' s  concepts  of  distal  and  proximal  stimulus  fall  short 
because  the  circularity  problem  remains,  "The  distal  stimulus  gives  rise  to  the  proximal 
stimulus  which  in  turn  contributes  to  the  building  of  a  percept  representative  of  the  initial 
distal  stimulus"  [MURC73].  The  distinction  between  distal  and  proximal  stimuli  are 
better  explained  by  using  the  terms:  potential  stimulus  and  effective  stimulus  [GIBS66] 
[GIBS67]. 

Any  object  or  event  in  the  environment  is  a  potential  stimulus.  When  such  a  potential 
stimulus  stands  in  a  constant  relationship  with  a  given  response,  it  is  an  effective  stimulus. 
Thus  we  are  able  to  describe  the  environment  independently  of  the  responses  of  an 
observer.  This  is  particularly  important  when  we  consider  that  one  is  often  unaware  of  all 
the  responses  elicited  by  a  stimulus.  [MURC73] 

The  inherent  linkage  between  sensation  and  perception'can  best  be  summed  up  as 

follows:  "To  sense  is  to  respond,  to  perceive  is  to  know"  [MURC73]. 

But  what  happens  when  we  are  exposed  to  multiple  stimuli?  When  two  or  more 

stimuli  occur  at  the  same  time  and/or  space  some  very  interesting  perceptual  phenomena 

arise.  The  cause  of  this  phenomena  can  be  explained  as  follows:  "When  two  qualitatively 

different  stimuli  are  applied  to  the  same  locus  on  the  sensory  surface  very  rapidly,  rapidly 


enough  so  that  the  two  stimuli  are  perceived  as  a  single  event,  the  perceptual  qualities  of 
the  two  [stimuli]  merge"  [MARK78].  Multiple  stimuli  response  and  sensory  interaction 
are  the  crux  of  this  dissertation.  Some  of  the  well-known  and  accepted  intersensory 
theories  and  perspectives  are  presented  in  the  next  section. 

C.       THE  SENSES 

1.  Classification 

The  concept  of  separate  sense  modalities  has  been  around  for  a  long  time  having 
its  roots  date  back  to  the  time  of  Aristotle  (circa  384-322  B.C.)  [WALK81].  Although  we 
typically  believe  we  have  only  five  senses,  we  really  have  upwards  of  30  or  40  senses 
depending  on  how  the  senses  are  classified.  One  such  classification  divides  the  senses 
into  the  following  modalities:  Vision,  Audition,  Cutaneous  Sensitivity,  Olfaction, 
Gustation,  Kinesthesis,  Labyrinthine  Sensitivity,  and  Organic  Sensitivity.  [FOST68] 
Figure  1  depicts  this  classification  of  the  senses  along  with  associated  sense  organs, 
stimulus,  and  sensory  qualities. 


Modality                                       Sense  Ot««n                     Peripheral  Nerve  Ending*  """'roicclKin's "                      Normiil  Slimulgj 

Vision eye                               rods  and  cones  of  ret-  occipiial  lobe               photic  energy 

ina 

Audition ear                                  hair  cells  of  organ  of  temporal  lohe               acoustic  energy 

Corti 

Cutaneous  sensitivity                skin                              specialized     and     free  parietal  lobe                mechanical  and 

nerve  endings  thermal  energy 

Olfaction olfactory  cleft  of      rods  of  olfactory  epi-  rhinencephalon            volatile  substances 

nostril  ihelium 

Gustation tongueandmouth      lasie  buds  of  papillae  parietal  lobe               soluble  substances 

region 

Kinesthesis muscles   joints,           specialized  and  free  parietal  lobe                 mechanical  energy 

tendons  nerve  endings 

Labyrinthine  sensitivity            nonauditory                 hair  cclJs  of  crista  and  none  (?),  projects      mechanical    forces 

labyrinth                     macula  tothecerebellum          and  gravity 

Organic  sensitivity portionsofgastro-      specialized  and  free  parietal  lobe                mechanical  energy 

intestinal  tract  nerve  endings 

Figure  1.  Classification  of  the  Senses  From  [FOST68]. 


Sensory  Qualities 


colors  (red, gray) 
tones  and  noises 

pressure     pain, 

heat,  cold 
odors  (fragrant, 

spicy) 
sweet,  salt,  sour, 

bitter 
pressure,  pain 


pain,  pressure 


2.  Sensory  Interaction 

In  1940,  Ryan  [RYAN40]  conducted  a  thorough  literature  survey  on  sensory 
interaction.  Based  on  the  intersensory  research  investigated,  the  following  are  some  of 
Ryan's  findings: 


( 1 )  ...it  is  extremely  rare  outside  of  the  controlled  conditions  of  the  laboratory  that 
even  a  single  object  is  the  product  of  operations  of  a  single  sensory  system. 

(2)  Under  certain  conditions  it  can  be  shown  that  qualities  perceived  by  one  sensory 
system  arc  influenced  by  stimuli  reaching  other  sense  organs. 

(3)  ...it  is  evident  that  sensory  systems  are  part  of  a  unified  organism  and  by  no  means 
isolated  from  one  another.  [RYAN40] 

Ryan  ultimately  concludes  that  the  study  of  the  interrelations  among  the  senses  is 

"...sorely  in  need  of  further  investigation..."  [RYAN40]. 

In  1941,  Gilbert  [GILB41]  conducted  another  extensive  literature  review  on 

intersensory  facilitation  and  inhibition.  It  is  interesting  to  note  that  Ryan  was  unaware  of 

Gilbert's  work,  until  after  Ryan's  work  was  published,  and  Gilbert  does  not  mention 

Ryan's  efforts.  Nevertheless,  Gilbert  makes  the  following  conclusions  concerning  the 

effect  of  heteromodal  (intersensory)  stimulation  on  sensitivity  to  stimulus  intensity: 

(1)  Under  conditions  of  momentary  heteromodal  stimulation  (a)  a  sufficiently  intense 
stimulus  will  momentarily  reduce  sensitivity  in  another  modality,  and  increase  it  after  an 
optimum  interval  (about  1/2  sec);  (b)  a  less  intense  heteromodal  stimulus  will 
momentary  increase  sensitivity. 

(2)  Under  conditions  of  prolonged  stimulation,  there  is  some  evidence  that  the  quality 
of  the  heteromodal  stimulus  may  determine  the  direction  of  the  effect,  some  stimuli 
acting  as  excitants,  others  as  depressants.  It  is  not  clear,  however,  whether  there  is  a 
differential  effect  among  the  various  modalities. 

(3)  The  affect  will  be  limited  by  the  liability  of  the  sensation  affected,  and  individual 
differences  in  their  susceptibility  to  heteromodal  influence.  [GILB41] 

Upon  reviewing  all  intersensory  research  (through  1941),  Gilbert  realized  that  the  current 

view  on  the  psychophysical  aspect  of  intersensory  interactions  is  lacking.  Gilbert's  final 

concluding  remarks  state  that: 

Modern  psychophysics  has  produced  overwhelming  evidence  of  the  inadequacy  of  the 
traditional  static  relationship  between  stimulus  and  response,  wherein  each  attribute  of  a 
sensory  response  was  conceived  of  as  determined  simply  by  the  value  of  a  corresponding 
physical  dimension  of  the  "adequate"  stimulus.  Actual  experimental  evidence...  has 
shown  that  the  dimensions  of  stimulation  are  inter-dependent  in  affecting  a  sensory 
response,  and  that  sensation  may  be  dependent  on  the  interaction  of  excitations,  on 
mental  set,  physiological  state  of  the  organism,  practice,  and  numerous  other  factors,  all 
interrelated  in  a  constant  state  of  flux.  [GILB41] 

In  1947,  Sherrington  [SHERR47]  tries  to  explain  higher-order  sensory  integration 

as  a  process  in  which  "...each  sense  system  is  served  by  specific  receptors  that  project  to 
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specific  sensory  centers  in  the  brain.  Intersensory  interaction  is  the  concept  by  which 
multisensory  stimuli  of  the  real  world  (e.g.,  rhythm)  are  integrated  in  the  brain" 
(summarized  by  [WALK81]). 

In  1954,  London  [LOND54]  presented  his  findings  based  on  the  extensive 
intersensory  research  conducted  in  the  Soviet  Union.  Upon  the  review  of  numerous 
intersensory  experiments,  London  concludes  that  the  conditions  that  influence  sensory 
interaction  are  best  summarized  as  follows:  1 )  Strength  of  accessory-  stimulus,  2) 
Excitatory  state  of  sense  organs,  3)  Duration  of  accessory  stimulation,  4)  Termination  of 
accessory  stimulation,  5)  AJfectivity  of  stimulus,  6)  Physiological  state,  7)  Diurnal 
variation,  8)  Summation,  repetition  and  cumulation  of  accessory  effects  [LOND54] 
[STON68]. 

In  reviewing  London's  research  efforts,  Stone  and  Pangborn  findings  indicate 

that: 

We  respond  to  environmental  stimuli  through  all  avenues  of  sensory  input,  and, 
although  the  extent  of  their  interrelationship  is  not  well  understood,  it  is  generally 
accepted  that  the  stimulation  of  one  sense  organ  influences  to  some  degree  the  sensitivity 
of  the  organs  of  another  sense.  [STON68] 

Stone  and  Pangborn  ultimately  conclude  that  "...there  exists  a  great  need  for  further 

definitive  [intersensory]  studies.  Quantification  of  individual  variability  in  response  to 

dual  stimulation  does  not  seem  to  have  been  investigated,  nor  has  three-way  stimulation 

been  reported"  [STON68]. 

In  1966,  Gibson  [GIBS66]  [GIBS79]  suggests  that: 

...  perceptual  systems  cannot  be  gracefully  categorized  in  terms  of  specific  sensory 
systems,  that  under  natural  conditions  many  senses  respond  and  interact  to  environmental 
stimulation,  and  the  organism  itself  is  initiating  rather  than  reacting  to  events.  This  means 
that  intersensory  perception  and  integration  are  not  specialized  higher-order  complex 
reactions,  but  are  the  rule  for  all  perception,  (summarized  by  [WALK81]) 

In  other  words,  it  is  the  particular  surrounding  environment  which  determines  how  our 

senses  respond  and  interact.  As  a  result,  sensory  interaction  must  be  based  on  the 

complexity  of  natural  life  events  and  not  on  simple  isolated  systems. 


In  1978,  a  more  modern  view  of  sensory  interaction  is  provided  by  Lawrence 
Marks  which  is  outlined  in  the  excellent  book.  The  Unity  of  the  Senses:  Interrelations 
among  the  Modalities  [MARK78].  From  a  simple  to  a  more  complex  perspective,  Marks 
describes  what  he  calls  the  Five  Doctrines  of  sensory  correspondence.  Briefly,  these  five 
doctrines  are  outlined  as  follows: 

1.  Doctrine  of  Equivalent  Information.  ...different  senses  can  inform  us  about  the 
same  features  of  the  external  world. 

2.  Doctrine  of  Analogous  Attributes  and  Qualities.  Despite  the  salience  of  the 
phenomenal  differences  among  qualities  of  various  sense  modalities,  there  are  a  few 
properties  held  in  common. 

3.  Doctrine  that  Different  Senses  have  Corresponding  Psychophysical  Properties: 
...this  theory  proposes  that  at  least  some  of  the  ways  the  senses  behave  and  operate  on 
impinging  stimuli  are  general  characteristics  of  sensory  systems,  similar  from  vision  to 
hearing,  from  touch  to  olfaction. 

4.  Doctrine  that  Similar  or  Identical  Neurophysiological  Mechanisms  Parallel 
Sensory  Correspondence.  ...there  is  a  neural  analogue  to  each  of  the  psychological 
doctrines  [the  first  three  doctrines]. 

5.  Doctrine  of  the  Unity  of  the  Senses.  ...incorporates  all  of  the  first  four  theories,  and 
in  which  the  several  senses  are  interpreted  as  modalities  of  a  general,  perhaps  more 
primitive  sensitivity.  [MARK78] 

According  to  the  various  intersensory  research  studied  by  Marks,  he  believes  that 

the  dimension  of  quality  appears  to  show  the  fewest  similarities  from  modality  to 

modality,  but  that  intensity  displays  the  strongest  cross-modal  similarity.  However, 

Marks  concedes  that  "The  entire  area  of  cross-modality  comparisons  of  sensory  quality 

has  hardly  been  explored  experimentally"  [MARK78].  Furthermore,  Marks  concludes 

that  any  sensory  interaction  is  highly  stimuli  dependent.  As  Marks  explains: 

Perhaps  the  most  crucial  factor  in  determining  the  significance  of  any  interaction  is 
the  objective  relationship  between  the  stimuli  that  are  used.  When  stimuli  presented  to 
different  senses  bear  no  meaningful  relation  to  each  other,  interaction  often  seems  to  be 
small  or  nonexistent.  ...But  meaningfully  related  stimuli  are  quite  a  different  matter.  ... 
Meaningful  perceptual  interactions. ..occur  when  concurrent  information  enters  different 
sensory  channels. [MARK78] 

An  interesting  point  by  Marks  which  deserves  mentioning  is  that: 

Similarity  across  the  senses  must  necessarily  be  one  step  removed  from  similarity 
within  a  sense,  for  there  is,  by  definition,  no  continuity  between  modalities.  If  the  senses 
were  truly  continuous  there  would  only  be  one  sense.  [MARK78] 
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In  1 98 1 ,  based  on  her  research  with  blind  and  normal  children,  Susanna  Millar 
[MILL81]  concludes  that  the  sense  modalities  are  neither  separate  nor  unitary.  "They 
[modalities]  are  some  of  both,  complementary  to  each  other,  and  information  can  be  used 
flexibly  from  different  modalities"  [WALKS  1].  A  further  conclusion  that  Millar  makes  is 
that  "...we  are  slowly  beginning  to  understand  the  interrelationships  of  the  sense 
modalities.  Global  generalizations  do  not  seem  to  hold.  No  one  current  theory  seems 
capable  of  encompassing  the  diversity  of  findings"  [WALK81]. 

In  1981,  O'Connor  and  Hermelin  [OCON81],  having  conducted  experiments  with 

children  suffering  from  either  specific  perceptual  or  general  cognitive  handicaps,  describe 

sensory  integration  through  the  concept  of  sensory  capture  as  follows: 

One  aspect  of  sensory  integration  can  be  demonstrated  by  the  phenomenon  of 
"sensory  capture,"  in  which  conflicting  input  to  different  sense  modalities  is  often  not 
perceived  as  such.  Instead,  the  observer  seems  to  resolve  such  conflict  by  making  one 
sense  impression  conform  with  another  dominant  one.  ...Such  "capture"  of  one  sensory 
input  by  another  is  of  interest  because  it  suggests  that  there  may  be  a  degree  of  perceptual 
equivalence  between  various  sensory  information,  so  that  the  same  stimulus  qualities  tend 
to  be  perceived  in  various  modalities.  [OCON81] 

3.  Neurological  Perspective 

Because  of  recent  advances  in  technology  in  the  field  of  neurology,  there  has  been 

a  surge  in  intersensory  research  from  a  neurological  perspective.  The  reason  for  this 

much  deserved  neurological  emphasis  it  that: 

...there  has  been  comparatively  little  done  to  understand  the  neural  phenomena  that 
make  multisensory  integration  possible.  The  paucity  of  neural  data  about  multisensory 
integration  is  due  in  part  to  different  strategies  researchers  have  used  to  explore  the 
functional  organization  of  the  nervous  system,  and  also  to  the  inherent  difficulties  in 
conducting  multisensory  studies.  ...For  while  the  perceptual  phenomena  demonstrates 
that  interactions  among  different  sensory  modalities  are  commonplace  and  that 
constancies  among  the  modalities  must  exist  in  order  to  use  them  together  effectively, 
there  is  no  comparable  body  of  literature  describing  the  neural  mechanisms  that  underlie 
them.  Nevertheless,  there  is  a  good  deal  of  information  about  the  location  in  the  brain 
where  inputs  from  different  modalities  converge.  [STEI93] 

One  place  in  the  brain  where  visual,  auditory,  and  somatosensory  inputs  converge  is  in 

the  superior  colliculus  as  depicted  in  Figure  2.  Furthermore,  in  looking  at  the  horizontal 

and  vertical  meridians  of  the  different  sensory  representations  in  the  superior  colliculus, 

one  can  see  that  they  are  very  similar  in  terms  of  a  common  coordinate  system.  Stein  and 
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Figure  2.  The  Superior  Colliculus  From  [HARV98]. 

Meredith  conclude  that  this  common  coordinate  system  suggests  a  representation  of 

Multisensory  Space  (see  Figure  3).  By  examining  the  neurological  responses  of  superior 

colliculus  in  various  animals,  primarily  the  cat.  Stein  and  Meredith  have  found 

considerable  evidence  supporting  the  principles  of  multisensory  convergence  and 

interaction  based  on  single  neuron  evoked  potentials  as  depicted  in  Figure  4.  Stein  and 

Meredith  believe  that  neurological  studies  in  other  animals  are  very  important  and  lead  to 

a  better  understanding  of  human  perception.  Thus,  based  primarily  on  the  neurological 

studies  of  other  animals,  primarily  cats,  Stein  and  Meredith  outline  the  rules  in  terms  of 

space  and  time  governing  multisensory  integration  as  based  on  unimodal  receptive  field 

characteristics  as  follows: 

Space:  spatially  coincident  multisensory  stimuli  tend  to  produce  response 
enhancement,  whereas  spatially  disparate  stimuli  produce  either  depression  or  no 
interaction. 
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Figure  3.  Common  Coordinate  System  in  the  Superior  Colliculus  Suggesting 
Multisensory  Space  From  [STEI93]. 


Figure  4.  Convergence  of  Inputs  from  the  Different  Senses  on 
a  Single  Neuron  From  [STEI93]. 

Time:  maximal  multisensory  interactions  are  not  dependent'on  matching  the  onset  of 
two  different  sensory  stimuli,  or  their  latencies,  but  on  how  the  activity  patterns  resulting 
from  the  two  inputs  overlap. 

[Overall]. ..the  spatial  register  among  the  receptive  fields  of  multisensory  neurons  and 
their  temporal  response  properties  provide  a  neural  substrate  for  enhancing  responses  to 
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stimuli  that  covary  in  space  and  time  and  for  degrading  responses  that  are  not  spatially 
and  temporally  related.  [STEI931 

Although  they  found  considerable  evidence  supporting  a  neurological  basis  for  sensory 

integration.  Stein  and  Meredith  conclude  that:  "an  enormous  number  of  challenges  must 

be  met  before  we  understand  more  fully  the  process  involved  in  integrating  information 

from  different  sensory  modalities"  as  seen  in  Figure  5. 


Figure  5.  Neurons  Synthesize  Information  from  Different 
Sensory  Modalities  From  [STEI93]. 


D.       AUDITION 

1.  Definition 

Before  audition  can  be  defined,  we  need  to  have  an  understanding  of  what  is 
meant  by  sound.  The  following  gives  a  formal  definition  of  sound: 

Sound  is  the  perception  by  humans  of  vibrations  in  some  physical  medium,  usually 
air.  These  physical  vibrations  of  the  air  are  evidenced  by  alternating  rarefractions  and 
compressions.  Man's  primary  sense  organ  for  the  sound  stimulus  is  the  ear.  [SELB68] 
(see  Figure  6) 
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The  formal  definition  of  hearing  (the  sense  of  audition)  from  a  physiological  perspective 
is  as  follows: 

Hearing  is  the  response  of  an  animal  to  sound  vibrations  by  means  of  a  special  organ 
for  which  such  vibrations  are  the  most  effective  stimulus.  The  critical  phrase  here  is 
"most  effective,"  which  means  that  this  special  organ  (which  we  shall  call  an  ear)  is  more 
sensitive  to  sound  than  it  is  to  any  other  form  of  energy.  All  other  mechanoreccptors 
respond  to  acoustic  vibrations  if  these  vibrations  are  strong  enough  and  sufficiently  low 
in  frequency,  but  they  do  so  crudely,  requiring  large  amounts  of  energy  in  comparison 
with  what  they  require  in  the  stimuli  that  are  most  appropriate  to  them  and  in  relation  to 
what  the  ear  requires  within  its  proper  frequency  range.  Organs  in  the  skin  (tactual  and 
deep  pressure  endings)  in  muscles,  tendons,  and  joints  (kinesthetic  endings),  in  the 
vestibular  labyrinth  (gravity  and  motion  receptors),  and  even  pain  organs  throughout  the 
body  can  all  be  excited  by  sounds  of  sufficient  strength.  But  none  of  these  organs 
approaches  the  ear  in  delicacy  and  in  the  effectiveness  of  utilization  of  sounds  as  a  means 
of  gaining  information  about  the  outside  world.  [WEVE74] 


Pinna 


Figure  6.  The  Ear  From  [MURC73]. 

In  other  words,  although  the  entire  human  body  is  capable  of  hearing  sounds,  the  ear  is 
the  most  sensitive  to  sound  which  in  turn  makes  it  the  primary  mechanism  for  hearing 
sounds. 

2.  Subjective  Evaluation 

Given  that  we  can  hear  sounds,  how  do  we  rate  the  quality  of  sound?  What  is  of 
good  quality  to  one  person  may  be  of  bad  quality  to  another.  As  a  result,  rating  the 
quality  of  sound  is  a  subjective  task  based  largely  on  the  rendering  capability  of  the 
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equipment  that  is  generating  the  task.  Another  aspect  to  the  quality  of  sound  is  that  of 

content.  For  example,  some  may  like  to  listen  to  rock-and-roll  where  intentional 

distortion  is  often  reproduced  as  high  quality;  whereas,  others  may  think  the  musical 

quality  of  rock-and-roll  is  poor.  Content  is  an  important  consideration  when  conducting 

sound  quality  tests  of  loudspeakers  or  headphones,  and  studies  have  shown  that  when 

conducting  sound  quality  experiments  "...the  problem  of  selecting  test  material  was 

evident.  Relevant  test  material  has  not  yet  been  defined.  Different  recording  techniques 

influence  the  assessment  of  the  sound  quality"  [THEI86].  Although  content  is  important, 

this  research  effort  focuses  on  the  perception  of  the  physical  characteristics  of  the  sound. 

But  what  physical  characteristics,  dimensions,  attributes,  etc.,  of  sound  are  applicable  to 

rate? 

Zwicker  and  Zwicker  [ZWIC9 1  ]  propose  that: 

The  information  received  by  our  auditory  system  can  be  described  most  effectively  in 
the  three  dimensions  of  specific  loudness,  critical-band  rate,  and  time.  The  resulting 
three-dimensional  pattern  is  the  measure  from  which  the  assessment  of  sound  quality  can 

be  achieved.  [ZWIC91] 

i 

In  experiments  conducted  to  identify  perceived  sound  quality  of  loudspeakers, 
Gabrielsson  and  Lindstrom  had  subjects  rate  music  on  a  category  scale  from  0-10  using 
the  following  dimensions:  "Clarity,  Fullness,  Spaciousness,  Brightness,  Softness, 
Absence  of  Extraneous  Sounds,  and  Fidelity."  [GABR85]  as  depicted  in  Figure  7. 
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Figure  7.  Sound  Quality  Rating  Scale  From  [GABR85]. 
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Based  on  Gabrielsson  and  Lindstrom's  efforts,  Toole  [TOOL85]  expanded  the 
dimensions  on  which  to  rate  sound  quality  to  include  a  specific  rating  format  for  spatial 
quality  as  depicted  in  Figure  8. 
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Figure  8.  Spatial  Quality  Rating  Scale  From  [TOOL85]. 

In  evaluating  the  quality  of  loudspeakers  using  an  impulsive  tone-burst  signal, 
Furmann  et  al.  [FURM90]  had  subjects  rate  the  following  attributes  on  a  scale  of  0-10: 

1)  Sharpness  -  The  sound  contains  components  whose  mid-and  high-frequency  levels 
are  too  high. 

2)  Pureness  —  The  sound  is  not  distorted,  devoid  of  sounds  not  appearing  in  the 
signal,  readable  in  the  entire  frequency  range. 

3)  Equalness  —  The  sound  retains  the  proportion  of  tones;  it  is  linear  without 
expansion  of  tones. 

4)  Clearness  --  The  sound  is  pure  and  clear;  different  instruments  and  voices  can  be 
distinguished  easily;  onsets  and  transients  in  the  music  can  be  perceived  easily. 

5)  Feeling  of  Space  —  The  reproduction  is  spacious;  the  sound  is  open,  has  width  and 
depth,  fills  the  room,  gives  the  impression  of  the  subjects  presence  in  the  space 
surrounded  by  sound.  [FURM90] 
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In  measuring  subjective  and  objective  acoustical  measurements,  Burkhard  and 
Genuit  [BURK92]  recognize  that  any  acoustical  measurement  system  should  yield 
information  that  relates  to  how  humans  hear.  As  such,  Burkhard  and  Genuit  identify  the 
relevant  parameters  that  are  involved  during  the  classification  of  a  sound  event  by  a 
human  listener  as  seen  in  Figure  9. 


Figure  9.  Parameters  Relevant  to  Evaluation  of  Sound  by  Human  Listeners 

From  [BURK92}. 

In  terms  of  spatial  hearing,  Blauert  [BLAU97],  identifies  proven  and 
hypothesized  psychophysical  theories  corresponding  to  positional  auditory  events.  These 
events  are  categorized  as  follows:  Basic  vs.  Supplemental,  Homosensory  vs. 
Heterosensory,  and  Fixed-position  vs.  Motional.  The  physical  processes  and  phenomena 
which  make  use  of  these  psychophysical  theories  are  outlined  in  Figure  10.  For  more 
insights  in  how  humans  perceive  the  quality  of  sound,  see  the  following:  [BECH90] 
[TOOL90]  [VIEM90]  [BURK92]  [THUR92]. 
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E.        VISION 


1.  Definition 

A  formal  definition  of  vision  is  as  follows. 


'^^^X    Vitreous  humor 

Corneas 

Light 

I 

/ 

Aqueous' 

humor 

Lens 

^^^/             Optic  nerve 
Retina 

Figure  11.  The  Eye  From  [MURC73]. 

Vision  is  a  complex  phenomenon  consisting  of  several  basic  components.  Sight  from 
external  sources  is  brought  to  a  focus  on  the  retina  of  the  eye.  Changes  are  produced 
which  initiate  electrical  impulses.  These  are  conducted  over  the  optic  nerve  and  optic 
tract  to  the  brain  where  the  visual  sensation  is  perceived  and  interpreted.  [MCNA68]  (see 
Figure  1 1) 

2.  Subjective  Evaluation 

An  approved  method  for  the  subjective  evaluation  of  visual  displays  can  be  found 
in  the  Method  for  the  Subjective  Assessment  of  the  Quality  of  Television  Pictures 
published  by  the  Geneva  International  Telecommunications  Union  [GENE86].  This 
publication  recommends  using  a  five-point  rating  scale  for  evaluating  quality.  The  five 
points  on  the  rating  scale  are  as  follows:  1  Bad,  2  Poor,  3  Fair,  4  Good,  and  5  Excellent. 
Also,  the  use  of  non-expert  observers  is  recommended,  and  the  number  of  observers 
should  be  at  least  ten  and  preferably  twenty.  Also,  the  publication  recommends  that  an 
experimental  testing  session  should  not  last  more  than  roughly  30  minutes,  and  that  a 
duration  of  10  seconds  for  visual  stimuli  is  sufficient  for  still  or  moving  sequences. 
Furthermore,  the  publication  suggests  that  visual  stimuli  may  be  based  on  a  randomized- 
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block  design  derived  from  Greco-Latin  squares.  (See  [GOOD95]  for  an  example  of  the 
Latin  squares  technique.) 

After  an  exhaustive  literature  review,  Padmos  and  Milders  [PADM92]  present  a 
long  list  of  quality  criteria  for  simulator  images.  This  list  includes  criteria  based  on: 
Visually  Perceiving  the  Environment,  Physical  Image  Properties,  Image  Capacity, 
Appearance  of  Surfaces,  Visibility  and  Light  Effects,  and  other  miscellaneous  features. 
The  target  simulator  for  this  quality  criteria  is  that  of  the  vehicle  simulator,  but  the  criteria 
apply  equally  well  to  virtually  any  type  of  simulator  image. 

3.  Visual  Dominance 

The  current  view  of  visual  dominance  can  be  attributed  to  the  work  of  Posner  et 

al.  (see  [POSN76]).  Posners  efforts  tried  to  identify  why  the  visual  modality  tends  to 

"dominate  conscious  judgements  about  the  presence  and  location  of  objects"  [POSN76]. 

Posner' s  general  theory  of  visual  dominance  includes  the  following  four  propositions: 

Proposition  1.  Visual  stimuli  are  not  as  automatically  alerting  as  stimuli  in  other 
modalities. 

Proposition  2.  In  order  for  a  visual  event  to  serve  as  an  effective  alerting  stimulus,  the 
subject  must  first  process  it  by  active  attention. 

Proposition  3.  The  consequence  of  active  attention  toward  any  one  modality  is  a 
reduction  in  the  availability  of  the  attentive  mechanisms  to  input  from  other  modalities. 

Proposition  4.  To  compensate  for  the  low  alerting  capability  of  visual  signals,  subjects 
exhibit  a  general  attentional  bias  toward  the  visual  modality  whenever  they  are  likely  to 
receive  reliable  input  from  that  modality.  This  bias  may  not  be  obvious  to  them,  but  it  can 
be  viewed  as  a  strategy  of  a  very  pervasive  sort.  [POSN76] 

F.        ATTENTION 

"The  essence  of  the  concept  of  attention  is  the  focusing  of  awareness" 
[DEMB79].  Our  span  of  attention  is  derived  from  our  span  of  perception.  Perception 
spans  the  range  from  subliminal  stimuli  (unconscious  awareness)  to  liminal  stimuli 
(conscious  awareness)  as  depicted  in  Figure  12.  Using  the  common  searchlight  metaphor 
as  depicted  in  Figure  1 2,  the  three  main  aspects  of  attention  in  perception  are  as  follows: 
1)  Selective  Attention:  corresponds  to  the  direction  of  the  search  light;  2)  Focused 
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Figure  12.  The  Span  of  Attention  and  the 
Span  of  Perception  From  [DEMB79]. 


Attention:  corresponds  to  the  immediate  center  of  the  beam  of  light  illuminated  by  the 
searchlight;  and  3)  Divided  Attention:  corresponds  to  both  the  immediate  center  of  the 
beam  of  light  and  the  fringe  just  outside  the  beam  of  light.  Overall,  attention  plays  a 
pivotal  role  in  human  information  processing,  one  that  not  only  selects  information 
sources  to  process  but  also  acts  as  a  commodity  or  resource  of  limited  availability 
[WICK92]  (see  Figure  13). 

1.  Selective  Attention 

As  the  searchlight  metaphor  explains,  selective  attention  directs  the  searchlight. 
Thus,  selective  attention  is  concerned  with  the  process  of  how,  when,  what,  and  where  we 
actually  focus  on  (or  attend  to)  various  and  numerous  stimuli.  The  selection  process  acts 
as  sort  of  a  filter  between  sensory  processing  and  attention  as  depicted  in  Figure  14. 
Numerous  theories  over  the  years  have  tried  to  describe  the  nature  of  this  selection 
process.  One  of  the  more  popular  theories  is  Broadbent's  Filter  Theory  [BROA58]. 

a.    Broadbent's  Filter  Theory 

Broadbent  proposed  that  the  brain  contains  a  selective  filter  which  chooses  messages 
on  the  basis  of  physical  characteristics  toward  which  it  is  "tuned"  and  rejects  others.  The 
filter  spares  the  limited-capacity  system  from  being  overloaded;  complex  forms  of  input 
are  rejected  on  the  basis  of  simple  qualities,  and  a  higher-level  analysis  of  them  need  not 
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Figure  13.  A  Model  of  Human  Information  Processing  From  [WICK92]. 


Figure  14.  Selective  Attention  From  [MURC73]. 
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occur.  ...In  essence,  the  filter  model  views  the  selective  nature  of  attention  as  resulting 
from  restrictions  in  the  capacity  of  the  nervous  system  to  process  information. 
...Preference  is  shown  for  novel  or  intense  events,  acoustic  over  visual  signals,  sounds  of 
high  frequency,  and  signals  of  biological  importance  to  the  organism.  [DEMB79]  (see 
Figure  15) 
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Figure  15.  Information-Flow  in  Broadbent's  Filter  Theory  From  [DEMB79]. 

b.  Filter  Attenuation  Theory 

Although  the  Filter  Theory  seemed  adequate,  a  number  of  studies, 

primarily  conducted  by  Anne  Treisman  [TREI69]  [TREI73],  soon  identified  certain 

limitations.  As  a  result,  a  modification  was  made  to  the  Filter  Theory  resulting  in  the 

Filter  Attenuation  Theory. 

The  essence  of  this  modification  is  that  filtering  is  not  an  all-or-none  affair.  Treisman 
suggested  that  the  filter  does  not  cut  off  rejected  messages  entirely,  but  instead  attenuates 
their  strength.  Thus,  under  some  conditions,  the  weakened  signals  can  still  contact 
higher-level  elements  of  the  perceptual  system.  [DEMB79]  (see  Figure  16) 

c.  Response-Selection  Theory 

An  entirely  different  perspective  of  selection  attention  was  formalized  by 
Deutch  and  Deutch  [DEUT63].  This  theory,  called  the  Response-Selection  Theory, 
maintains  "...that  all  mental  inputs  are  fully  analyzed  perceptually  and  that  selection  takes 
place  only  when  the  observer  responds  to  stimuli"  [DEMB79]. 


27 


Reipome 


\     ° 
o   \  o 


Own  nime 
/ 


X 


-Ar 


"  DI«lontr)f  " 
Arulyilt  of  meaning 


'Selective  filter" 


Discrimination  of  pitch. 
Intensity  etc 


Shadowed  'ear  .  Rejected  ear 


Figure  16.  Information  Flow  in  Treisman's 
Filter  Theory  From  [DEMB79]. 


d.    Hybrid  Theory 

Recognizing  the  debate  over  the  various  theories  of  selective  attention 
(which  continues  still  today),  Dember  [DEMB79]  suggests  another  possible  solution  as 
follows: 

It  is  conceivable  that  our  cognitive  capacities  are  more  flexible  than  we  have  been 
willing  to  assume,  and  that  both  perceptual  and  response  selection  can  take  place  under 
appropriate  circumstances.  ...This  new  breed  of  attentional  theory  may  very  well  prove  of 
conceivable  value  in  directing  research  toward  a  more  satisfactory  solution  to  the  mystery 
of  selection  attention.  [DEMB79] 

2.  Divided  Attention 

Whereas  selective  attention  deals  with  our  ability  to  direct  our  focus  among 
stimuli,  divided  attention  deals  with  our  ability  to  divide  our  attention  among  stimuli  or 
tasks.  Divided  attention  occurs  when  "the  task  is  to  attend  to  several  simultaneously 
active  input  channels  or  messages,  responding  to  each  as  needed"  [BOFF86].  Early 
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researchers  believed  that  it  was  impossible  to  attend  to  several  simultaneous  stimuli  -- 
that  attention  was  indivisible.  Nowadays,  divided  attention  is  readily  believed,  but  how 
we  divide  our  attention  has  raised  considerable  debate.  The  issue  is  whether  or  not  we 
process  simultaneous  inputs  in  parallel  or  in  serial.  However,  the  conclusions  drawn  from 
considerable  research  suggest  that  "...both  modes  of  processing  occur,  depending  on  the 
task  and  on  the  circumstances,"  [KAHN73]  and  whether  or  not  the  stimuli  are  intramodal 
or  intermodal.  Our  ability  to  divide  our  attention  among  various  stimuli  directly 
corresponds  to  our  limited  ability  to  time-share  among  these  various  stimuli. 

3.  Time-Sharing 

Our  ability  to  time-share  depends  on  how  efficient  we  schedule  and  switch 

between  various  stimuli.  For  example,  if  we  are  given  plenty  of  time  to  complete  two 

separate  tasks,  we  will  probably  complete  one  task  then  switch  to  completing  the  other 

task.  However,  if  the  amount  of  time  we  are  given  is  drastically  reduced,  we  might  have 

to  engage  in  completing  both  tasks  concurrently.  Processing  tasks  concurrently  leads  to 

three  further  factors  which  will  influence  our  ability  to  successfully  complete  concurrent 

processing.  These  factors  are:  confusion  of  the  task,  cooperation  between  task  processes, 

and  competition  for  task  resources.  [WICK92] 

Confusion  results  when  elements  for  one  task  become  confused  with  the  processing  of 
another  task  because  of  their  similarity. 

Cooperation  occurs  when  there  is  a  high  similarity  of  processing  routines  between 
tasks  which  can  result  in  the  possible  integration  of  the  two  task  elements  into  one. 

Competition,  the  critical  element  of  concurrent  task  time-sharing,  relates  to  the  level 
of  difficulty  between  the  tasks  --  the  greater  the  difficulty,  the  greater  the  competition. 
[WICK92] 

When  we  say  that  difficult  tasks  (stimuli)  are  in  competition  with  one  another,  this 

competition  refers  to  competing  for  the  limited  amount  of  total  available  resources 

needed  to  complete  the  tasks.  With  this  in  mind,  there  are  two  theories  on  how  resources 

are  allocated  to  attention:  1)  Single-Resource  Theory,  and  2)  Multiple-Resource  Theory. 
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Figure  17.  Single  Resource  Theory  From  [WICK92]. 

a.    Single-Resource  Theory 

The  Single-Resource  Theory  (see  [KAHN73])  argues  that  we  have  one 
single  supply  of  undifferentiated  resources  available  to  all  tasks  and  mental  activities. 
"As  task  demands  increase  either  by  making  a  given  task  more  difficult  or  by  imposing 
additional  tasks,  physiological  arousal  mechanisms  produce  an  increase  in  the  supply  of 
resources"  [WICK92].  The  Single-Resource  Theory  is  depicted  in  Figure  17.  The  main 
limitation  of  this  theory  is  that  it  compares  task  difficulty  within  the  same  dimensional 
constraints.  As  such,  it  does  not  consider  the  structure  of  the  task  as  it  relates  to  the 


30 


1 Stages 

Central 
Encoding  processing  Responding 


O 


Visual 


Auditory 


Spatial 


Manual  ^ 


Spatial 


%, 


Verbal 


Vocal 


% 


Verbal 


Figure  18.  Multiple  Resource  Theory  From  [WICK92]. 

processing  of  the  task  such  as  its  Codes,  Modalities,  and  Stages.  [WICK92]  Correcting 
this  limitation  provides  the  impetus  for  the  Multiple-Resource  Theory. 

b.    Multiple-Resource  Theory 

The  Multiple-Resource  Theory  stipulates  that  tasks  are  processed  based  on 
multi-dimensional  constraints.  These  constraints  involve  the  task's  Codes  (Spatial  vs. 
•  Verbal),  Modalities  (Auditory  vs.  Visual),  and  Stages  (Encoding,  Central  Processing,  and 
Responding)  as  depicted  in  Figure  18.  As  such,  "...people  have  several  different 
capacities  with  resource  properties.  Tasks  will  interfere  more  and  difficulty-performance 
trade-offs  will  be  more  likely  to  occur,  if  more  resources  are  shared."  [WICK92]  For 
example,  two  visually  dominating  tasks  may  compete  for  the  same  resources  resulting  in 
greater  interference  (competition)  of  the  two  tasks.  But,  if  one  task  is  visually  dominating 
and  one  task  is  aurally  dominating,  they  may  not  have  to  compete  with  each  other,  for 
they  utilize  separate  resources  as  depicted  in  Figure  1 8  as  opposed  to  common  resources 
as  depicted  in  Figure  17. 
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4.  Sustained  Attention 

Sustained  attention  deals  with  our  ability  to  maintain  focused  attention  over 
prolonged  time  periods.  Sustained  attention  is  commonly  referred  to  as  vigilance.  During 
the  early  Cold  War  years  (1950s  through  1980s),  there  was  an  increased  threat  of  global 
thermonuclear  war.  As  such,  radar  operators  monitored  their  radar  scopes  for  potential 
incoming  missiles  for  prolonged  periods  of  time  (vigilance).  Because  of  the  severe 
repercussions  that  could  result  if  a  radar  and/or  sonar  operator  missed  a  bleep  on  the 
scope,  the  study  of  vigilance  became  very  popular  (on  both  sides  of  the  cold  war).  The 
results  of  these  studies  provided  new  insights  into  such  theories  as:  Vigilance,  Signal 
Detection,  Expectancy,  Arousal,  and  Habituation.  The  concept  of  sustained  attention  does 
not  play  a  role  in  this  dissertation.  It  is  being  presented  to  complete  the  discussion  of 
attention  and  to  clarify  the  issues  of  attention  that  are  relevant  to  this  research  effort. 
During  the  preliminary  literature  review  of  this  dissertation,  much  time  was  spent 
reviewing  auditory-visual  vigilance  studies.  For  a  listing  of  pertinent  auditory-visual 
cross-modal  signal  detection  and  vigilance  research,  see  APPENDIX  B.  AUDITORY- 
VISUAL  CROSS-MODAL  SIGNAL  DETECTION  AND  VIGILANCE 
BIBLIOGRAPHY. 

5.  Cognitive  Ecology  Perspective 

Ecology  is  the  study  of  the  interaction  of  living  creatures  with  their  environment. 

For  ecological  psychology,  the  focus  is  the  relation  of  mind  to  environment.  Cognitive 

Ecology  is  a  new  field  "...  a  deep  ecology  of  the  mind,  in  which  mind  and  environment 

are  treated  not  as  separate  objects  or  topics  but  as  codefining  poles  of  experiences  and 

actions"  [FRIE96].  In  the  book,  Cognitive  Ecology  [FRIE96],  two  qualitatively  different 

aspects  of  attention  are  described  as  having:  (1 )  a  clear  nucleus  of  focus  of  attention,  and 

(2)  a  fringe  to  that  experience.  The  focus  of  attention  refers  to  the  typical  searchlight 

metaphor  of  attention.  The  fringe  refers  to: 

...  many  types  of  experience,  such  as:  (1)  feelings  of  familiarity,  (2)  feelings  of 
knowing,  such  as  tip-of-the-tongue-experiences,  (3)  feelings  of  relation  between  objects 


32 


or  ideas,  (4)  feelings  of  action  tendency,  as  in  intentions,  (5)  feelings  of  expectancy,  (6) 
feelings  of  Tightness  or  being  on  the  right  track.  ...(7)  metaknowledge  of  one's  memory  or 
one's  abilities...  [and]  (8)  Perhaps  the  most  pervasive  fringe  feeling  is  that  of 
meaningfulness,  that  one  knows  the  larger  context  of  any  given  moment  of  focal  attention 
although  that  context  is  not  part  of  the  content  of  attention.  [FRIE96] 

There  are  three  issues  in  which  this  fringe  experience  are  relative  to  cognitive  ecology:  1) 

the  issue  of  knowledge  of  content,  2)  the  issue  of  capacity,  and  3)  the  issue  of  agency. 

The  second  issue,  that  of  capacity,  identifies  potential  shortcomings  of  the  tradition  view 

of  attention.  Specifically: 

Attention  is  normally  viewed  either  explicitly,  or  more  recently  implicitly,  as  a 
limited-capacity  system.  ...This  may  be  because  only  focal  attention  is  normally 
investigated.  A  mind  that  is  defined  literally  as  part  of  its  environment  (the  subjective 
pole  of  attention  in  a  subject-object  field)  should  have  much  broader  attentional 
capacities  than  a  mind  defined  as  separate.  Many  of  the  anomalies  of  attention  and 
consciousness  research,  such  a  blind  sight  and  the  other  agnosias,  are  cases  that  violate 
the  standard  limited-capacity  conception.  Investigation  of  fringe  phenomena  may  serve  to 
expand,  or  perhaps  undermine,  models  of  attentional  limits.  [FRIE96] 

G.       GESTALT  THEORY 

Gestalt  Theory  was  founded  by  German  Psychologists  Max  Wertheimer 
[WERT  12],  Kurt  Koffka  [KOFF35],  and  Wolfgan  Kohler  [KOHL40].  The  basic  idea  of 
Gestalt  Theory  is  that  we  perceive  things  wholistically  as  opposed  to  its  parts.  "Certainly 
to  process  information  as  wholistic  or  gestalt  stimuli  rather  than  as  separate  elements  is 
an  efficient  thing  for  the  organism  to  do  ~  and  possibly  that  is  the  advantage  of  gestalt 
patterns"  [GARN70].  As  a  result,  to  view  things  as  whole,  rather  than  as  parts,  we 
perceptually  organize  things,  objects,  etc.  into  groups.  The  Gestalt  Factors  of  Perceptual 
Organization  include  the  following: 

1)  Factor  of  Similarity,  2)  Factor  of  Proximity,  3)  Factor  of  Common  Fate,  4)  Factor 
of  Objective  Set,  5)  Factor  of  Inclusiveness,  6)  Factor  of  Good  Continuation,  7)  Factor  of 
Closure,    8)  Factor  of  Fixation,  9)  Factor  of  Contour,  and  10)  Factor  of  Object 
Interdependence.  [MURC73] 

Gestalt  Theory  was  developed  primarily  to  explain  how  we  perceptually  group  visual 

objects,  but  its  concepts  can  also  be  applied  to  the  other  senses. 
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H.       SYNESTHESIA 

One  of  today's  leading  experts  in  the  study  of  synesthesia  is  Richard  Cytowic.  He 

defines  synesthesia  as 

...an  involuntary  joining  in  which  the  real  information  of  one  sense  is  accompanied  by 
a  perception  in  another  sense.  In  addition  to  being  involuntary,  this  additional  perception 
is  regarded  by  the  synesthete  as  real,  often  outside  the  body,  instead  of  imagined  in  the 
mind's  eye.  [CYT089] 

It  is  estimated  that  synesthesia  occurs  in  about  one  in  25,000  individuals  [CYT095],  so 

its  occurrence  is  fairly  rare.  One  of  the  most  common  forms  of  synesthesia  is  that  of 

colored  hearing.  A  synesthete  experiences  colored  hearing  when  certain  sounds  (physical 

stimuli)  evoke  perceptions  of  various  colors.  For  example,  when  listening  to  certain 

classical  music,  a  synesthete  might  experience  shades  of  blue  and/or  green.  Colored 

hearing  is  the  most  common  form  of  synesthesia.  Another  more  bizarre  example  is  that  of 

gustatory-tactile  synesthesia.  In  this  case,  the  synesthete  experiences  (perceives)  certain 

shapes  based  on  various  tastes  (physical  stimuli)  (see  Figure  19)  In  fact,  because  of  the 

bizarre  nature  of  this  condition,  Cytowic  wrote  an  entire  book  based  on  the  research  of  a 

man  with  gustatory-tactile  synesthesia.  See  [CYT093]  for  an  in-depth  review  of 

gustatory-tactile  synesthesia. 

The  concept  of  synesthesia  dates  back  over  two  hundred  years.  For  an  exhaustive 

survey  of  all  classic  and  contemporary  synesthesia  literature  dating  back  over  this 

interval,  see  [BAR096].  The  validity  of  synesthesia,  though,  has  suffered  over  the  years 

for  it  is  introspective  in  nature.  However,  Cytowic  has  helped  to  validate  synesthesia  by 

examining  the  neural  substrates  of  synesthesia  as  outlined  in  [CYT089]  [CYT093].  The 

results  of  Cytowic's  research  indicate  that: 

The  synesthetic  experience  may  be  a  result  of  a  fundamentally  mammalian  process  in 
which  the  cortex  briefly  ceases  to  function  in  the  modern  manner,  permitting  the  senses 
to  fuse,  or,  rather,  we  should  say,  perceive  fusion  that  may  be  there  all  along  but  that 
never  arises  to  consciousness.  At  its  essence,  synesthesia  may  be  a  remnant  of  how  early 
mammals  perceived  their  world.  ...Synesthesia  is  what  we  all  do  without  knowing  that  we 
do  it,  whereas  synesthetes  do  it  and  know  that  they  do  it.  [CYT089] 
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Figure  19.  Tasting  Shapes  From  [CYT089]. 


I. 


MULTIMEDIA 


"According  to  a  recent  projection,  multimedia  and  creative  technologies  will 

represent  a  new  market  of  $40  billion  by  the  year  2000  and  $65  billion  by  the  year  2010" 

[GUPT97].  As  such,  there  is  indeed  a  market  emphasis  on  multimedia  and  there  are  still 

many  unanswered  questions.  To  support  the  continued  growth  of  multimedia,  it  must 

expand  and  develop  in  parallel  with  internet  technology,  not  as  an  afterthought  or  as  an 

add-on.  As  such, 

...  the  central  integrated  media-systems-related  issue  that  must  be  addressed  during 
the  next  decade  is  storage,  indexing,  structuring,  manipulating,  and  "discovery"  of 
integrated  multimedia  information  units  (MIUs)  that  include  structured  data  values 
(strings  and  numbers),  text,  images,  audio,  and  video.  The  key  research  focus  in  this  area 
centers  on  managing  multimedia  information  units  in  the  context  of  a  highly  distributed 
and  interconnected  network  of  information  collections  and  repositories.  Current  data  and 
knowledge  management  technology  that  addressees  collections  of  formatted  data  and  text 
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is  inadequate  to  meet  the  needs  of  video  and  audio  information,  as  well  as  the  mixture  of 
modalities  in  MIUs.  [GUND97] 

In  [BLAT961,  Blatter  and  Glinert  express  the  need  for  a  greater  understanding  and  need 

for  multimodal  integration.  They  correctly  recognize  that  "Although  we  have  seen  much 

progress  in  recent  years  in  the  use  of  single  modalities,  the  general  problem  of  designing 

integrated  multimodal  systems  is  not  well  understood"  [BLAT96].  One  of  the  reasons  for 

the  current  lack  of  integrated  multimodal  systems  is  that  the  system  designers,  i.e. 

computer  scientists,  are  not  knowledgeable  with  the  issues  associated  with  multimodal 

concepts.  Thus, 

...the  (computer)  scientists  who  design  the  new  interfaces  and  human-computer 
communications  devices  must  address  issues  whose  solutions  lie  outside  of  their 
discipline.  Integrating  modalities  requires  understanding  how  people  use  their  various 
senses  to  perceive  and  interact  with  the  world  around  them.  Despite  more  than  100  years 
of  research  into  these  issues,  much  remains  unknown.  [BLAT96] 

As  a  result,  "Research  by  non-computer  scientists  shows  that  computer  scientists  have 

sometimes  failed  to  appreciate  the  distinction  between  human  and  computer  modalities" 

[BLAT96].  This  explains  why  it  is  typical  to  judge  a  simulation  or  virtual  environment  by 

the  auditory  and  visual  technical  rendering  capabilities  of  the  system  (computer  and 

displays),  as  opposed  to  how  well  stimulated  are  the  auditory  and  visual  sensory 

modalities  of  the  immersed  participant,  i.e.  an  engaged  human. 

Brenda  Laurel  [LAUR93],  provides  numerous  insights  into  the  use  of  multimedia 

and  human-computer  interaction.  She  states  that  "Multiple  modalities  are  desirable  only 

insofar  as  they  are  appropriate  to  the  action  being  represented"  [LAUR93].  With  an 

artistic  background,  Laurel  brings  a  much-needed  dimension  to  field  of  multimedia.  With 

her  creative  experience,  she  correctly  recognizes  that  an  artistic  touch  can  lead  to  better 

(smarter)  multimodal  integration  in  multimedia  systems.  Accordingly,  Laurel  states: 

But  we  mustn't  fall  prey  to  the  notion  that  more  is  always  better,  or  that  our  task  is  the 
seemingly  impossible  one  of  emulating  the  sensory  and  experimental  bandwidth  of  the 
real  world.  Artistic  selectivity  is  the  countervailing  force  —  capturing  what  is  essential  in 
the  most  effective  and  economic  way.  A  good  line-drawn  animation  can  sometimes  do  a 
better  job  of  capturing  the  movements  of  a  cat  than  a  motion  picture,  and  no  photograph 
will  ever  capture  the  essence  of  light  in  quite  the  same  way  as  the  paintings  of  Monet. 
The  point  is  that  first-person  sensory  and  cognitive  elements  are  essential  to  human- 
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computer  activity.  There  is  a  huge  difference  between  an  elegant,  selective  multi-sensory 
representation  and  a  .representation  that  squashes  sensory  variety  into  a  dense  but 
monolithic  glob  of  text.  [LAUR93] 

Thus,  we  must  not  assume  that  we  always  need  the  best  possible  graphics  and  audio.  The 

particular  application,  overall  sensory  perception,  and  creative  use  of  stimuli  ought  to 

drive  fidelity  requirements. 

J.        SUMMARY 

In  summary,  this  chapter  has  provided  the  computer  scientist  with  a  high-level 
overview  of  Perception,  The  Senses,  Audition,  Vision,  Attention  Theory,  Gestalt  Theory, 
Synesthesia,  and  Multimedia. 


37 


38 


III.  LITERATURE  REVIEW 

A.  INTRODUCTION 

This  chapter  presents  a  literature  review  on  relevant  auditory-visual  cross-modal 
perception  phenomena.  Whereas  the  background  provided  in  the  previous  chapter 
presents  a  general  overview  of  the  concepts  underlying  the  psychological  and 
physiological  nature  of  auditory  and  visual  perception,  this  chapter  specifically  focuses 
on  VEs  and  auditory-visual  intersensory  phenomena.  Using  the  background  provided  in 
the  previous  chapter,  the  reader  can  better  understand  the  theoretical  basis  and  overall 
findings  of  the  numerous  auditory-visual  research  endeavors  outlined  in  this  chapter. 

B.  VIRTUAL  ENVIRONMENTS 

1.  Definition 

The  National  Research  Council's  (NRC)  Committee  on  Virtual  Reality  Research 

and  Development  defines  VE  systems  with  the  following  explanation: 

Virtual  environment  systems  differ  from  other  previously  developed  computer- 
centered  systems  in  the  extent  to  which  real-time  interaction  is  facilitated,  the  perceived 
visual  space  is  three-dimensional  rather  than  two-dimensional,  the  human-machine 
interface  is  multimodal,  and  the  operator  is  immersed  in  the  computer-generated 
environment.  [DURL95] 

But  what  does  virtual  mean?  Ellis  [ELLI96]  tries  to  clarify  the  term  virtual  by 

introducing  the  concept  of  virtualization  which  is  the  "...process  by  which  a  viewer 

interprets  patterned  sensory  impressions  to  represent  objects  in  an  environment  other  than 

that  from  which  the  impressions  physically  originate"  [ELLI96].  Ellis  continues  to 

explain  that  virtualization  applies  primarily  to  vision  and  audition  and  that  there  are  three 

levels  of  virtualization:  Virtual  Space,  Virtual  Image,  and  Virtual  Environment  as 

depicted  in  Figure  20.  Furthermore,  because  of  the  diverse  nature  of  VEs,  the  NRC 

Committee  explains  that  the  development  of  a  VE  requires  "...a  crucial  need  for 

cooperation  among  many  disciplines,  including  computer  science,  electrical  and 
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Figure  20.  Levels  of  Visualization  From  [ELLI96]. 

mechanical  engineering,  sensorimotor  psychophysics,  cognitive  psychology,  and  human 
factors"  [DURL95].  Cross-disciplinary  transfer  of  knowledge  is  typically  lacking, 
causing  a  potential  degradation  of  VE  development.  This  dissertation  attempts  to  better 
facilitate  cross-disciplinary  transfer  of  knowledge  and  to  hopefully  improve  VE 
development  with  respect  to  auditory-visual  cross-modal  perception  considerations. 

2.  Multimodal  Concerns 

"...the  development  of  multimodal  synthetic  environments  is  an  extremely 
important  and  challenging  endeavor.  [It]. ..requires  that  we  carefully  examine  our  current 
assumptions  concerning  VE  architectural  requirements  and  design  constraints" 
[DURL95].  One  of  the  first  multimodal  networked  VEs  was  that  of  Networked  SPIDAR 
[ISHI94].  In  this  networked  VE,  participants  collaborated  on  the  design  of  3D  objects 
using  visual,  audio,  and  haptic  information.  The  developers  of  Networked  SPIDAR 
believed  that  "A  networked  virtual  environment  must  support  these  interactions  [visual, 
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Figure  21.  Multimodal  Modes  in  Virtual  Environments  From  [GUPT97]. 

audio,  and  haptic]  without  contradiction  in  either  time  or  space"  [ISHI94].  Gupta  et  al. 

[GUPT97]  also  describes  experiments  using  multimodal  environments  to  enhance 

computer-aided  design  (CAD).  They  describe  the  relationship  of  the  inserted  human 

participant  to  auditory,  visual,  and  haptic  feedback  devices  as  depicted  in  Figure  21. 

However,  the  majority  of  research  and  development  in  VEs  has  typically  focused  on  the 

sense  of  vision  (i.e.,  the  visual  channel).  Accordingly: 

To  date  much  of  the  design  emphasis  in  VE  systems  has  been  dictated  by  the 
constraints  imposed  by  generating  the  visual  scene.  The  nonvisual  modalities  have  been 
relegated  to  special-purpose  peripheral  devices.  ...However,  many  of  the  issues  involved, 
in  the  modeling  and  generation  of  acoustic  and  haptic  images  are  similar  to  the  visual 
domain;  the  implementation  requirements  for  interacting,  navigating,  and  communicating 
in  a  virtual  world  are  common  to  all  modalities.  Such  multimodal  issues  will  no  doubt 
tend  to  be  merged  into  a  more  unitary  computational  system  as  the  technology  advances 
over  time.  [DURL95] 

Thus,  proper  VE  development  must  focus  on  all  modalities  equally.  This  focus  on  the 
modalities  need  not  only  concentrate  on  the  intra-relationships  but  also  on  the  inter- 
relationships. As  the  NRC  Committee  explains:  "Detailed  study  of  both  intrasensory  and 
intersensory  illusions  is  important  because,  in  many  cases,  the  existence  of  illusions 
enables  SE  [synthetic  environment]  systems  design  to  be  simplified  and  therefore  to 
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increase  its  cost-effectiveness"  [DURL95].  Furthermore,  under  the  category  of 

Psychological  Considerations  the  NRC  Committee  recommends  further  study  in 

"...channel-interaction  effects  that  occur  with  multimodal  interfaces."  Some  notable 

channel-interaction  (intersensory)  effects: 

...include  those  on  the  dominance  of  vision  over  audition  and  haptics  in  cases  of 
intermodality  conflict  (e.g.,  as  evidenced  in  the  ventriloquist  effect)  and  on  the  use  of 
auditory  stimuli  to  improve  the  perception  of  events  that  are  represented  primarily  in  the 
visual  or  haptic  domains  (as  in  the  use  of  sound  effects)  [DURL95]. 

It  seems  fairly  obvious  by  this  point  that  proper  development  of  VEs  must 

consider  multimodal  factors.  Since  we  currently  have  the  technology  to  render  very  high 

quality  auditory  and  visual  displays,  the  proper  use  of  this  technology  must  not  neglect 

potential  auditory  and  visual  cross-modal  perception  phenomena.  Brenda  Laurel  makes 

the  point  that  auditory  and  visual  cross-modal  issues  have  always  been  a  consideration  in 

the  art  world.  Now  with  the  recent  surge  in  the  development  of  VE  technology,  the  same 

cross-modal  considerations  of  the  Arts  apply  to  VEs.  Brenda  Laurel  states: 

VR  has  reinvigorated  and  recontextualized  the  study  of  human  sensation  and 
perception.  While  much  is  known  about  the  human  visual  or  auditory  or  tactile  senses, 
relatively  little  is  known  "scientifically"  about  how  these  senses  combine.  Still  less  is 
known  about  how  they  combine  in  the  context  of  representations,  as  opposed  to  the 
context  of  the  actual  world.  For  example,  it  is  well  known  in  the  folklore  of  computer 
game  design  that  high-quality  audio  makes  people  perceive  visual  displays  to  have  higher 
resolution.  It  is  also  well-known  that  the  converse  is  not  true:  Great  graphics  will  not  turn 
a  PC's  beeps  and  boops  into  Beethoven.The  study  of  sensory  combinatorics,  that  is,  how 
vision  affects  audition  or  how  the  two  in  concert  affect  emotion,  was  almost  exclusively 
the  province  of  the  arts  until  VR  came  on  the  scene.  [LAUR93] 

3.  Fidelity  Requirement 

What  are  the  fidelity  requirements  of  a  VE?  First  and  foremost  (and  sometimes 
neglected),  the  intended  outcomes  of  the  particular  application  ought  to  drive  the  fidelity 
requirements.  For  example,  the  visual  fidelity  of  a  VE  intended  to  train  surgeons  in  open- 
heart  surgery  probably  needs  to  be  greater  than  the  visual  fidelity  of  a  VE  intended  to 
teach  children  how  to  read.  Another  consideration  is  that  of  the  human  sensory  system: 
the  fidelity  requirements  of  VEs  need  not  exceed  that  of  the  human  perceptual  system.  As 
such,  "Knowledge  of  normal  human  resolving  power  On  the  input  side,  i.e.,  the  sensory 
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side,  allows  one  to  predict  the  display  resolution  beyond  which  finer  resolution  cannot  be 
perceived  and  would  therefore  be  wasted"  [DURL95].  For  example,  the  auditory  fidelity 
of  many  VEs,  in  terms  of  frequency  range,  need  not  exceed  that  of  the  nominal  range  of 
human  hearing  (i.e.,  20  Hz  -  20  kHz).  A  caveat  pertains  here:  some  research  indicates  that 
our  perceptual  frequency  range  is  much  greater  (see  [OOHA91]  [BOYK97]). 
Nevertheless,  the  capabilities  of  the  human  sensory  system  ought  to  drive  the  fidelity 
requirements  of  VEs  as  depicted  in  Figure  22. 
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Figure  22.  Computer  Technology  Organization  for  Virtual  Reality 

From  [DURL95]. 


Details  regarding  humans'  ability  to  detect  and  discriminate  visual,  auditory, 
tactile,  and  kinesthetic  information  along  with  corresponding  technical  specifications  of 
VE  equipment  is  presented  in  the  excellent  paper  by  Barfield  et  al.  [BARF95].  Barfield 
states  that  "It  is  important  to  have  a  thorough  understanding  of  the  capabilities  of  the 
human's  sensory  systems  and  to  use  this  knowledge  in  the  design  of  virtual  worlds  and  in 
deriving  technical  specifications  for  virtual  environment  equipment"  [BARF95]. 
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When  Barfield  compares  the  human  sensory  system  with  technical  specifications 
of  VEs,  he  considers  the  modalities  as  separate  entities.  However,  the  VE  participant, 
being  human,  is  multimodal  by  nature.  As  a  result,  one  very  key  consideration  neglected 
in  Barfield's  paper  is  how  the  senses  interact,  and  another  is  how  this  sensory  interaction 
may  or  may  not  conflict  with  how  the  singular  modality  capabilities  derive  the 
specifications  of  VEs.  The  NRC  Committee  also  recognizes  that  visual  fidelity 
requirements  are  influenced  by  other  modalities  and  that  a  greater  understanding  is 
needed  in  multimodal  integration  in  hopes  of  answering  the  following  unanswered 
questions: 

How  are  the  required  visual  display  system  parameters  affected  within  multimodal 
systems?  Can  visual  display  system  requirements  be  relaxed  in  multimodal  display 
environments?  What  are  the  perceptual  effects  associated  with  the  merging  of  displays 
from  different  display  sources?  [DURL95] 

One  factor  in  considering  auditory  and  visual  fidelity  requirements  is  that  of  display 

resolution.  In  a  VE,  the  auditory  and  visual  resolutions  ought  to  be  properly  matched.  As 

Brenda  Laurel  correctly  states: 

...  we  also  sometimes  expect  certain  kinds  of  patterns  to  occur.  Although,  there  are 
many  reasons  for  emphasizing  one  modality  over  another,  we  tend  to  expect  that  the 
modalities  involved  in  a  representation  will  have  roughly  the  same  "resolution."  A 
simplistic  cartoon-style  animation  with  naturalistic  character  voices  and  environment 
sounds,  for  instance,  seems  out  of  whack.  A  computer  game  that  incorporates 
breathtakingly  high-resolution,  high-speed  animation  but  only  produces  little  beeps  seems 
brain-damaged.  [LAUR93] 

On  analyzing  the  use  of  performed  sound  and  music  in  VEs,  Pressing  [PRES97]  . 

classified  sound  into  three  categories:  1)  artistic  expression,  2)  information  transfer,  and 

3)  environmental  sounds.  Pressing  concluded  that:  "Across  all  three  categories  the  need 

for  further  research  on  the  psychological  aspects  of  sound  and  performance  in  virtual 

environments  was  apparent"  [PRES97].  Another  fidelity  consideration  is  that  "...cartoons 

and  caricatures,  despite  their  drastic  loss  of  information  and  fidelity,  may  better  serve  to 

represent  the  world,  clarify  visual  relationships. ..and  effect  our  thoughts. ..than  pictures  of 

high  fidelity"  [FRXE96].  Similarly,  on  integrating  sounds  and  motions  in  VEs,  "Sounds 

tend  to  affect  the  listener  in  a  more  subconscious  and  impressionistic  way  than  visual 
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cues"  [HAHN98].  Furthermore,  when  considering  the  fidelity  requirement  of  VEs,  there 
are  many  perspectives  from  which  to  view  fidelity,  perhaps  all  of  which  are  correct! 
Flach  and  Holden  [FLAC98]  outline  the  following  definitions  of  fidelity  from  various 
scientific  perspectives. 

1)  Newton's  Way:  Fidelity  is  derived  from  three-dimensional  space  and  time  (e.g., 
chronometric  analysis). 

2)  Einstein's  Way:  Since  space  and  time  are  relative  to  a  certain  frame  of  reference, 
they  cannot  be  scientifically  committed  to  any  sense  of  realism;  therefore,  space  and  time 
cannot  be  used  as  a  measure  of  fidelity. 

3)  Fechner's  Way:  Fidelity  is  defined  in  relation  to  the  correspondence  between  the 
simulated  world  and  the  "real"  world  as  measured  using  the  ruler  and  clock  of  classical 
physics. 

4)  Helmholtz's  Way:  Fidelity  is  defined  relative  to  the  ability  to  simulate  the 
biological  mechanisms  —  the  proximal  stimulus.  Thus,  binocular  and  binaural  inputs 
might  be  considered  essential  to  a  high-fidelity  experience  of  space. 

5)  Broadbent's  Way:  Information  processing  rate,  sensitivity,  bias,  and  stability  might 
prove  the  best  measures  of  fidelity. 

6)  Dewey's  Way:  The  measure  of  fidelity  is  the  degree  to  which  the  simulation 
captures  the  richness  of  natural  couplings  between  perception  and  action. 

7)  Gibson's  Way:    With  fidelity,  the  constraints  on  action  take  precedence  over  the 
constraints  on  perception,  and  reality  of  experience  is  defined  relative  to  functionality, 
rather  than  to  appearances.  (Paraphrased  from  [FLAC98]) 

4.  Presence 

Presence,  the  sense  of  being  there,  has  been  a  heavily  debated  topic  among  VE 
developers.  There  is  no  argument  that  the  sense  of  presence  within  a  VE  is  an  extremely 
vital  aspect  of  any  VE,  and  that  "...virtual  environments  that  are  best  at  simulating 
multiple  senses  are  also  best  at  evoking  a  feeling  of  presence  an  immersion"  [ANDE97]. 
The  debate  over  presence  is  a  debate  about  definition  and  measurement.  Depending  on 
your  interpretation,  there  can  be  many  possible  meanings  of  presence.  For  instance,  a 
well-written  book  can  cause  one  to  be  immersed  into  the  intricacies  of  a  good  plot.  A 
great  live  theater  production  or  cinematic  movie  can  also  stir  the  senses  causing  a  sense 
of  being  there  —  presence.  In  VE  applications,  we  typically  measure  presence  by  how 
well  our  senses  (all  of  them)  are  stimulated.  For  "...it  is  both  the  interactivity  and  the 
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quality  of  the  rendering  that  results  in  the  immersiveness  of  a  virtual  reality  or  multimedia 
system"  [BEGA94].  Sheridan  [SHERI96]  makes  an  interesting  observation  that  through 
evolution,  our  senses  developed  in  order,  from  tactile  to  vision  to  audition,  but  that 
technology  used  to  stimulate  our  senses  has  developed  in  reverse,  from  audition  to  vision 
to  tactile  as  depicted  in  Figure  23. 
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Figure  23.  Darwinian  Vs.  Technological  Evolution 
From  [SHERI96]. 


In  VE  applications,  most  agree  that  the  level  of  presence  is  directly  proportional 
to  the  level  of  audio,  visual  and  tactile  fidelity.  Accordingly,  "Tight  linkage  between 
visual,  kinesthetic,  and  auditory  modalities  is  the  key  to  the  sense  of  immersion  that  is 
created  by  many  computer  games,  simulations,  and  virtual-reality  systems"  [LAUR93]. 
As  such,  the  level  of  fidelity  is  directly  proportional  to  the  level  of  presence.  Thus,  the 
level  of  presence  must  be  a  function  of  fidelity.  Nevertheless,  most  do  not  agree  on  how 
to  measure  the  level  of  presence.  Sheridan  uses  the  following  Three  Attribute  Scale  of 
Presence  to  rate  the  fidelity  of  picture,  sound,  and  tactile  images. 

1.  Virtual  image  resolution  (pixels  or  taxels  per  frame),  refresh  rate  (frames  per 
second)  and  gray-or  color-scale  (bits  per  pixel  or  taxel)  are  too  few  to  convey  realism. 

2.  Virtual  image  fidelity  is  fairly  realistic.  Resolution  (pixels  or  taxels  per  frame), 
refresh  rate  (frames  per  second)  and  gray-or  color-scale  (bits  per  pixel  or  taxel)  are 
enough  to  convey  good  sense  of  reality. 

3.  Virtual  image  is  compelling.  Difficult  to  discriminate  the  virtual  from  the  real 
based  on  any  given  image.  [SHERI96] 


46 


Slater  and  Wilber  [SLAT97]  discuss  various  parameters  affecting  presence 
including  the  parameter  of  vividness  as  it  relates  to  pictorial  realism.  They  describe  an 
experiment  using  a  driving  simulator  in  which  two  different  levels  of  the  pictorial  realism 
were  presented  to  the  immersed  participant.  The  results  indicated  that:  "There  was  a 
significant  difference  in  the  level  of  reported  presence  between  the  two  levels  of  pictorial 
realism,  with  the  more  realistic  resulting  in  a  higher  level  of  reported  presence" 
[SLAT97].  As  a  result  of  their  research,  Slater  and  Wilber  introduce  the  Framework  for 
Immersive  Virtual  Environments  (FIVE)  which  shows  the  relationship  to  presence  among 
several  factors  including  visual,  auditory,  and  tactile  displays  as  depicted  in  Figure  24. 
Also,  in  a  previous  research  effort  [SLAT94],  Slater  found  that  a  person's  dominant  sense 
may  influence  a  person's  sense  of  presence. 


Figure  24.  Framework  for  Immersive  Virtual 
Environments  From  [SLAT97]. 

Hendrix  [HEND94]  [HEND96a]  [HEND96b]  conducted  a  number  of  experiments 
to  measure  the  level  of  presence  within  VEs  during  a  navigation  task  as  function  of  visual 
and  audio  display  parameters.  In  one  set  of  experiments,  the  visual  display  parameters 
manipulated  were:  1)  presence  or  absence  of  head  tracking,  2)  presence  or  absence  of 
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stereoscopic  cues,  and  3)  size  of  geometric  field  of  view  used  to  create  the  visual  image 

projected  on  the  visual  display.  In  another  set  of  experiments,  the  audio  display 

parameters  manipulated  were:  1 )  presence  or  absence  of  spatialized  sound,  and  2) 

nonspatialized  versus  spatialized  sound.  The  results  from  the  experiments  involving 

visual  display  parameter  manipulation  concluded:  "...a  significant  positive  correlation 

between  the  reported  level  of  presence  and  the  fidelity  of  the  interaction  between  the 

virtual  environment  participant  and  the  virtual  world"  [HEND96a].  The  results  from  the 

experiments  involving  audio  display  parameter  manipulation  indicated  that: 

...the  addition  of  spatialized  sounds  significantly  increased  the  sense  of  presence  but 
not  the  realism  of  the  virtual  environment.  Despite  this  outcome,  the  addition  of  a 
spatialized  sound  source  significantly  increased  the  realism  with  which  the  subjects 
interacted  with  the  sound  source,  and  significantly  increased  the  sense  that  sounds 
emanated  from  specific  locations  within  the  virtual  environment.  The  results  suggest  that, 
in  the  context  of  a  navigation  task,  while  presence  in  virtual  environments  can  be 
improved  by  the  addition  of  auditory  cues,  the  perceived  realism  of  a  virtual  environment 
may  be  influenced  more  by  changes  in  the  visual  rather  than  auditory  display  media. 
[HEND96b] 

As  such,  although  spatialized  sounds  can  increase  the  sense  of  presence  with  in  a  VE,  the 

perception  of  realism  in  a  VE  is  still  dominated  by  the  visual  modality. 

C.       AUDITORY-VISUAL  PERCEPTUAL  ORGANIZATION 

1.  Gestalt  Theory 

The  perception  of  an  auditory-visual  display  can  be  considered  in  terms  of  the 
Gestalt  point  of  view.  If  we  extend  the  Gestalt  Factors  of  Perceptual  Organization 
discussed  earlier  in  GESTALT  THEORY  (Chapter  II,  Section  G)  from  visual-only 
stimuli  to  visual  and  audio  stimuli,  the  factors  Of  Similarity,  Proximity,  Fixation  and 
Object  Interdependence  become  particularly  interesting  to  the  possible  perceptual 
grouping  of  an  auditory-visual  display.  The  definitions  of  these  (visual)  factors  are  as 
follows: 

Similarity:  If  a  number  of  elements  are  present  in  the  perceptual  field,  those  with 
similar  characteristics  will  be  seen  as  though  they  are  grouped  together. 
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Proximity:  Elements  of  the  perceptual  field  located  near  one  another  will  tend  to  be 
seen  as  a  group  or  unit. 

Fixation:  The  organization  of  certain  kinds  of  patterns  clearly  depends  on  where  the 
observer  fixes  his  attention. 

Object  Interdependence:  ...prevalent  in  the  organization  of  complex  patterns 
encountered  in  visual  experience  is  a  tendency  to  group  objects  that  are  functionally 
rather  than  physically  similar.  We  frequently  see  objects  in  this  way  if  they  display  some 
kind  of  interdependent  relationship.  [MURC73] 

When  a  high-quality  visual  display  is  coupled  with  a  high-quality  auditory  display,  for 

the  intended  presentation  of  an  audio- visual  display,  the  factor  of  Similarity  may  cause  a 

perceptual  quality  grouping  of  the  audio-visual  display.  Also,  through  the  perceptual 

illusion  of  the  ventriloquism  effect,  the  audio  portion  of  an  audio-visual  display  may 

perceptually  emanate  from  the  proximal  locality  of  the  visual  display  perhaps  causing  a 

perceptual  grouping  based  on  the  factor  of  Proximity.  When  viewing  any  audio-visual 

display,  the  observer  must,  at  sometime,  fixate  on  the  display  which  in  turn  might  cause  a 

perceptual  grouping  by  the  factor  of  Fixation.  Furthermore,  since  it  is  typical  to  hear 

music  playing  on  a  radio,  music  (audio)  and  a  radio  (visual)  may  be  perceptually  grouped 

together  through  the  factor  of  Object  Interdependence. 

2.  Auditory  Scene  Analysis 

In  terms  of  auditory-visual  interaction,  Al  Bregman  mentions  in  his  book, 
Auditory  Scene  Analysis:  The  Perceptual  Organization  of  Sound  that  there  many 
similarities  between  visual  and  auditory  perceptual  groupings.  Specifically, 

...  the  similarity  of  principles  of  organization  in  the  visual  and  auditory  modalities  is 
that  the  two  seem  to  interact  to  specify  the  nature  of  an  event  in  the  environment  of  the 
perceiver.  This  is  not  too  surprising,  since  the  two  senses  live  in  the  same  world  and  it  is 
often  the  case  that  an  event  that  is  of  interest  can  be  heard  as  well  as  seen.  Both  senses 
must  participate  in  making  decisions  of  "how  many,"  of  "where,"  and  of  "what." 
[BREG901 

But  as  opposed  to  the  Gestalt  point  of  view,  which  focuses  on  the  similarities  among 

modalities,  Bregman  also  presents  an  interesting  ecological  point  of  view  which  focuses 

on  the  differences  of  the  modalities. 

There  is  a  crucial  difference  in  the  way  that  humans  use  acoustic  and  light  energy  to 
obtain  information  about  the  world.  This  has  to  do  with  the  dissimilarities  in  the  ecology 
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of  light  and  sound.  In  audition  humans,  unlike  their  relatives  the  bats,  make  use  primarily 
of  the  sound-emitting  rather  than  the  sound-reflecting  properties  of  things.  They  use  their 
eyes  to  determine  the  shape  and  size  of  a  car  on  the  road  by  the  way  in  which  its  surfaces 
reflect  the  light  of  the  sun,  but  use  their  ears  to  determine  the  intensity  of  the  crash  by 
receiving  the  energy  that  is  emitted  when  this  event  occurs.  The  shape  reflects  energy;  the 
crash  creates  it.  For  humans,  sound  serves  to  supplement  vision  by  supplying  information 
about  the  nature  of  events,  defining  the  "energetics"  of  a  situation.  [BREG90] 

This  difference  between  vision  and  audition  is  further  evidenced  through  the  use  of 

echoes.  In  audition,  we  are  mainly  interested  in  the  direct  source  of  sound  rather  its 

echoes,  but  we  can  also  combine  direct  sound  and  indirect  sound  (echoes)  to  establish  a 

mixed  sound  which  still  conveys  information  of  the  direct  sound  but  with  the  additional 

properties  (i.e.  reverberation)  of  the  indirect  sound.  However,  with  vision,  we  are  mainly 

concerned  with  the  indirect  image  (echoes  or  reflections),  and  we  are  not  able  to  combine 

direct  and  indirect  images  to  establish  a  mixed  visual  image.  Bregman  suggests  that  it  is 

these  ecological  differences  which  might  cause  "apparent  violations  of  the  principle  of 

exclusive  allocation  of  sensory  evidence."  [BREG90] 

D.       AUDITORY-VISUAL  ART  FORMS  AND  FILM 

1.  Art  Forms 

In  terms  of  the  Arts,  Joseph  Schillinger  explains  the  correlation  of  visual  and 

auditory  art  forms  through  mathematics.  Schillinger  believed  that: 

A  scientific  theory  of  the  arts  must  deal  with  the  relationship  that  develops  between 
works  of  art  as  they  exist  in  their  physical  forms  and  emotional  responses  as  they  exist  in 
their  psycho-physiological  form,  i.e.,  between  the  forms  of  excitors  and  the  forms  of 
reaction.  As  long  as  an  art-form  manifests  itself  through  a  physical  medium,  and  is 
perceived  through  an  organ  of  sensation,  memory  and  associative  orientation,  it  is  a 
measurable  quantity.  Measurable  quantities  are  subject  to  the  laws  of  mathematics.  Thus, 
analysis  of  esthetic  form  requires  mathematical  techniques,  and  the  synthesis  of  forms 
(the  realization  of  forms  in  an  art  medium)  requires  the  technique  of  engineering. 
[SCHI48] 

Schillinger  referred  to  the  visual  art  form  as  Elements  of  Visual  Kinetic  Composition  and 

the  auditory  art  form  as  Elements  of  Music.  The  Elements  of  Visual  Kinetic  Composition 

consisted  of  the  following  four  main  components: 

1.  Linear,  plane  and  solid  trajectories  (distance,  dimension,  direction,  form). 
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2.  Illumination  (forms  and  intensity  of  light). 

3.  Texture  (density  of  matter,  quality  of  surface). 

4.  General  component:  time.  [SCHI48] 

The  Elements  of  Music  consisted  of  the  following  five  main  components: 

1.  Frequency  (pitch). 

2.  Intensity  (relative  dynamics). 

3.  Quality  (harmonic  composition). 

4.  Density  (quantitative  aggregation  of  sound). 

5.  General  component:  time.  [SCHI48] 

As  such,  Schillinger  believed  that  mathematics  might  appropriately  describe  visual  and 
auditory  correlated  art  forms  and  that  "The  correlation  of  the  general  component  in  both 
art  forms  may  be  assigned  to  different  proportionate  relations,  such  as  harmonic  ratios, 
distributive  powers,  series  of  growth,  etc."  [SCHI48].  Some  of  these  mathematical 
relations  which  describe  art  forms  are  depicted  in  Figure  25. 


quality  of  matter's  surface  quality  of  matter's  surface 


pitch  +  relative  dynamics  relative  dynamics  +  harmonic   com- 

position 

quality  of  matter's  surface  quality  of  matter's  surface 


harmonic  composition  -f  quantitative  quantitative  aggregation  of  sound  + 

aggregation  of  sound  pitch 


Figure  25.  Combined  Visual-Auditory  Art  Form  Mathematics  From  [SCHI48]. 

Furthermore,  Figure  26  depicts  Schillinger' s  concept  of  the  overall  relationship  among 
the  components  of  a  combined  kinetic  art  form. 

2.  Film 

For  many  years,  the  entertainment  industry  has  realized  the  important  relationship 

between  visuals  and  sound.  Even  before  sound  was  an  integral  part  of  film,  silent  movies 

were  accompanied  with  specific  music  to  enhance  the  mood  of  certain  scenes.  As  Gary 

Rydstrom  of  Sky  walker  Sound  explains: 

Storytelling,  mood  setting,  character  development,  drama  and  style  can  all  be  more 
successfully  realized  by  the  careful  collaboration  of  images  and  sounds.  There  is  a 
magical  level  reached  when  picture  and  sound  work  together,  a  creative  dimension  not 
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Figure  26.  Components  of  a  Combined  Kinetic  Art  Form  From  [SCHI48]. 

reached  by  either  picture  or  sound  alone.  ...When  approached  creatively,  the  combination 
of  sound  and  image  can  bring  something  to  vivid  life,  clarify  the  intent  of  the  work,  and 
make  the  whole  experience  more  memorable.  [RYDS94] 

Realizing  this  important  relationship  between  visuals  and  sound  in  film,  Lipscomb  and 

Kendall  [LIPS90]  [LIPS94]  investigated  the  perceptual  judgement  of  the  relationship 

between  musical  and  visual  components  in  film.  In  their  experiments,  they  took  various 

motion  picture  sequences  and  manipulated  their  soundtracks.  The  motion  picture 

sequence  containing  the  original  soundtrack  along  with  the  motion  picture  sequence 

containing  various  manipulated  soundtracks  were  presented  to  subjects.  The  task  of  the 

subject  was  to  select  the  soundtrack  that  best  fit  the  visuals  of  the  film.  Interestingly,  the 

results  indicated  that  "the  composer-intended  musical  score  [the  original  score]  was 
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identified  as  the  best  fit  by  the  majority  of  subjects  for  all  conditions"  [LIPS94].  In  a 
related  experiment,  they  also  found  significant  results  strongly  suggesting  that  a  musical 
soundtrack  can  in  fact  change  the  perceived  meaning  of  a  film  presentation. 

E.        AUDITORY-VISUAL  CROSS-MODAL  MATCHING 

Cross-modal  matching  is  using  information  obtained  through  one  sensory 

modality  to  make  a  judgment  about  an  equivalent  stimulus  from  another  modality. 

Lawrence  Marks  has  been  studying  auditory-visual  cross-modal  matching  over  the  last 

twenty-five  years.  He  has  conducted  several  experiments  which  suggest  a  strong 

auditory-visual  cross-modal  matching  among  brightness,  pitch,  and  loudness.  In  1974 

[MARK74],  he  had  subjects  match  pure  tones  to  the  brightness  of  gray  surfaces.  His 

results  indicated  that  most  subjects  matched  increasing  auditory  pitch  to  increasing  visual 

brightness.  Marks  further  concludes  that  his  findings  "...mimic  those  of  synesthesia..." 

[MARK74]  (see  SYNESTHESIA,  Chapter  II,  Section  H).  In  1982  [MARK82],  Marks 

conducted  a  series  of  four  experiments  in  which  subjects  used  scales  of  loudness,  pitch, 

and  brightness  to  evaluate  the  meanings  of  various  auditory-visual  synesthetic  metaphors 

such  as:  sound  of  sunset,  murmur  of  dawn,  and  bright  whisper  to  name  a  few.  He  found 

that  loudness  and  pitch  expressed  themselves  metaphorically  as  greater  brightness,  and 

likewise,  that  brightness  expressed  itself  metaphorically  as  greater  loudness  and  as  higher 

pitch.  This  series  of  experiments  led  Marks  to  believe  that: 

The  ways  that  people  eyaluate  synesthetic  metaphors  emulate  the  characteristics  of 
synesthetic  perception,  thereby  suggesting  that  synesthesia  in  perception  and  synesthesia 
in  language  both  may  emulate  from  the  same  source  —  from  a  phenomenological 
similarity  in  the  makeup  of  sensory  experiences  of  different  modalities.  [MARK82] 

Marks  has  also  conducted  experiments  involving  auditory-visual  cross-modal  perception 
of  intensity  [MARK86],  auditory-visual  cross-modal  similarities  in  speeded 
discrimination  [MARK87],  and  additional  experiments  concerning  auditory-visual  cross- 
modal  similarities  with  pitch,  loudness,  and  brightness  [MARK89].  The  results  of  these 
experiments  are  similar  to  his  earlier  experiments  and  provide  more  evidence  to  support 
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strong  auditory-visual  cross-modal  matching  among  pitch,  loudness,  and  brightness.  In 

terms  of  cross-modal  matching,  one  might  conclude  from  Marks'  findings  that  our  senses 

are  integrated  somehow.  However,  Stein  and  Meredith  offer  a  different  point  of  view 

based  on  a  neurological  perspective: 

While  cross-modal  matching  is  clearly  an  intersensory  phenomenon,  and  may  involve 
multisensory  neurons,  one  could  make  the  case  that  it  has  little  to  do  with  the  integration 
of  inputs  from  different  modalities  per  se,  and  that  multisensory  areas  of  the  brain  need 
not  play  any  special  role  in  this  process.  The  judgments  of  equivalence  across  modalities 
could  depend  on  the  individual  inputs  being  held  in  the  central  nervous  system  in 
modality-specific  form,  so  that  they  are  independent  of  one  another  but  still  may  be 
accessed  by  another  neural  pool.  [STEI93] 

F.        VISUAL  DOMINANCE  OVER  AUDITION 


1.  Ventriloquism  Effect 

A  well-known  auditory-visual  intersensory  phenomenon  is  that  of  the 
Ventriloquism  Effect  (see  [HOWA66]).  As  the  name  implies,  this  phenomenon  refers  to 
the  illusion  created  by  a  skilled  ventriloquist  when  we  think  we  hear  the  dummy  talking, 
when  in  fact  we  are  actually  hearing  the  altered  voice  of  the  ventriloquist.  Not  only  do  we 
hear  the  dummy  talking  but  we  actually  think  the  sounds  of  the  dummy  are  emanating 
from  the  dummy's  mouth  and  not  from  the  ventriloquist  even  though  we  know  that  the 
dummy  cannot  really  talk  as  depicted  in  Figure  27.This  effect  demonstrates  the  strong 
spatial  coupling  that  occurs  between  the  auditory  and  visual  senses,  and  as  a  result  has 
been  the  topic  of  much  research  (see  [HOWA66]  [PICK69]  [BERM76]  [RADE76] 
[WARR81]  [RAG088]  [STEI93]).  One  reason  why  the  ventriloquism  effect  occurs  is 
that  the  visual  sense  is  usually  the  dominant  sense  as  discussed  earlier  in  Visual 
Dominance  (Chapter  II,  Section  E).  As  a  result,  "...unless  there  are  dramatic  differences 
in  the  intensities  of  different  stimuli,  the  visual  effect  on  the  information  generated  in 
most  other  sensory  systems  is  greater  than  their  effect  on  visual  perception"  [STEI93]. 
Therefore: 

...if  visual  stimuli  are  appearing  at  the  same  frequency  and  providing  information  of 
the  same  general  type  or  importance  as  auditory  or  proprioceptive  stimuli,  biases  toward 
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Figure  27.  The  Ventriloquist  From  [STEI93]. 

the  visual  source  at  the  expense  of  the  other  two  [auditory  and  proprioceptive]  will  be 
expected  [WICK92]. 

2.  Experimental  Results  Supporting  the  Ventriloquism  Effect 

Radeau  and  Bertelson  [RADE76]  conducted  an  experiment  on  the  effect  of  a 
textured  visual  field  on  modality  dominance  during  the  ventriloquism  effect.  The  results 
indicated  that  "...visual  texture  affects  the  degree  of  auditory  capture  of  vision,  but  not  the 
degree  of  visual  capture  of  audition..."  [RADE76].  Bermant  and  Welch  [BERM76] 
investigated  the  effect  of  degree  of  separation  of  an  audio-visual  stimulus  and  eye 
position  upon  the  spatial  interaction  of  the  ventriloquism  effect.  One  of  the  more 
interesting  results  of  this  study  was  that  "...the  ventriloquism  effect  is  not  dependent  on 
the  use  of  a  visual  source  which  has  been  experimentally  associated  with  the  production 
of  sounds"  [BERM76].  The  role  of  auditory-visual  compellingness  in  the  ventriloquism 
effect  was  studied  by  Warren  et  al.[WARR8 1]  where  it  was  found  that  given  a  highly 
compelling  stimulus  situation,  "...subjects  showed  a  very  high  visual  bias  of  audition,  a 
significant  auditory  bias  of  vision,  and  a  sum  of  bias  effects  that  indicated  that  their 
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perception  was  fully  consonant  with  the  assumption  of  a  single  perceptual  event" 

[WARR81].  Ragot  et  al.  [RAG088]  explored  auditory  and  visual  ventriloquism 

reciprocal  effects.  Their  findings  suggested  that  "...visual  dominance  appears  when 

attention  is  divided  between  visual  and  auditory  modalities,  but  seems  to  be  absent. ..when 

the  subjects  are  asked  to  attend  to  one  modality  while  knowing  the  other"  [RAGO88]. 

Knudsen  and  Brainard  [KNUD95]  present  neurological  evidence  from  studying  the  optic 

tectum  (also  referred  to  as  the  superior  colliculus).  This  evidence  explains  the 

ventriloquism  effect  supporting  visual  dominance  over  audition.  They  conclude  that: 

The  angular  [spatial]  distance  that  can  separate  visual  and  auditory  stimuli  and  still 
result  in  facilitatory  interactions  in  tectal  neurons  depends  on  the  sizes  of  their  visual  and 
auditory  receptive  fields.  Because  visual  receptive  fields  are  consistently  smaller  than 
auditory  receptive  fields, ...bimodal  tectal  neurons  are  more  sensitive  to  displacements  of 
a  visual  stimulus  from  its  optimal  location  than  to  displacements  of  an  auditory  stimulus. 
As  a  consequence,  the  site  in  the  bimodal  tectal  map  that  is  activated  by  visual  and 
auditory  stimuli  should  be  more  sensitive  to  the  location  of  the  visual  stimulus  than  to  the 
location  of  the  auditory  stimulus.  [KNUD95] 

Knudsen  and  Brainard  believe  that  the  behavioral  correlates  of  this  neurological  evidence 

support  increased  sensitivity  and  localization  activity  when  stimuli  contain  both  visual 

and  auditory  components.  Figure  28  depicts  the  hypothetical  neural  representations  on  the 

tectal  surface  that  occur  with  spatially  separate  auditory  and  visual  stimuli. 

3.  Auditory-Visual  Divided  Attention  Experimental  Findings 

During  signal  detection  (temporal  in  nature  and  typically  associated  with 
sustained  attention  or  vigilance),  the  auditory  channel  proves  dominant  over  the  visual 
channel,  which  is  why  warning  signals  are  typically  produced  with  auditory  devices,  (see 
APPENDIX  B.  AUDITORY- VISUAL  CROSS-MODAL  SIGNAL  DETECTION  AND 
VIGILANCE  BIBLIOGRAPHY.)  However,  in  most  other  areas,  our  visual  sense 
dominates  the  hearing  sense  as  can  be  seen  from  the  following  experimental  findings. 

In  1954,  the  United  States  Air  Force  released  an  extensive  technical  report  which 
compared  the  visual  and  auditory  senses  as  channels  for  data  presentation  during  cockpit 
crew  coordination  [HENN54].  As  mentioned  in  this  report: 
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visual  only 


Hypothetical  neural  representations  of  spatially  separate  visual  and  auditory  stimuli 
(bottom),  schematically  illustrated  on  a  plane  representing  the  tectal  surface.  The  relative  activity  of 
different  tectal  loci  is  indicated  by  the  relative  height  above  the  plane.  Neurons  located  outside  of 
the  zones  of  excited  neurons  are  inhibited  (not  shown)  by  the  stimulus.  Top:  A  frontal  visual  stimulus 
results  in  a  sharp  peak  of  activity  centered  in  the  rostral  (R)  tectum.  Middle:  An  auditory  stimulus 
located  more  peripherally  results  in  a  peak  of  activity  centered  further  caudal  (C)  in  the  tectum.  The 
peak  is  broader  because  auditory  receptive  fields  are  much  larger  than  visual  receptive  fields. 
Bottom:  The  combination  of  visual  and  auditory  stimuli  results  in  a  single  peak  of  activity  located 
between  the  peaks  for  the  unimodal  stimuli  but  biased  towards  the  location  at  which  the  visual 
stimulus  was  represented. 


Figure  28.  Hypothetical  Neural  Representation  of  Auditory  and 
Visual  Stimuli  on  the  Tectal  Surface  From  [KNUD95]. 


The  evidence  seems  to  indicate  that  when  a  person  is  required  to  divide  his  attention 
or  to  shift  back  and  forth  between  two  tasks,  one  visually  controlled,  the  other  aurally 
controlled,  either  task  can  be  made  a  "priority"  task  at  the  expense  of  the  other.  Sense 
channel  as  such  does  not  determine  this  priority. 

One  of  conclusions  of  this  report  indicated  that  there  was  little  experimental  evidence 

comparing  audition  and  vision  as  channels  for  data  presentation.  The  Air  Force  found  that 
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"The  majority  of  the  studies  have  been  concerned  with  receptor  processes  and  sensory 

thresholds  rather  than  with  perceptual  phenomena"  [HENN54].  Ultimately,  the  Air  Force 

recognized: 

...the  many  practical  difficulties  that  have  stood  in  the  way  of  directly  comparing 
these  two  sense  modalities  [audition  and  vision]  in  the  experimental  laboratory.  It  has  not 
thus  far  been  possible  to  establish  common  dimensions  along  which  to  locate  comparable 
visual  and  auditory  stimuli.  Furthermore,  different  psychophysical  procedures  must 
frequently  be  employed  in  comparing  the  two  modalities  (largely  because  of  the 
temporal-sequential  character  of  auditory  stimuli).  As  a  consequence,  it  is  not  possible  to 
compare  directly  auditory  and  visual  judgments  with  broad  generality  and  high  degree  of 
practicability.  [HENN54] 

Francis  Colavita  [COLI74]  describes  a  series  of  experiments  exploring  sensoiy 
dominance  in  which  subjects  responded  to  suprathreshold  auditory  and  visual  stimuli. 
The  auditory  stimuli  consisted  of  tones  and  the  visual  stimuli  consisted  of  light  flashes. 
The  stimuli  were  randomly  presented  as  auditory-only,  visual-only,  and  combined 
auditory-visual.  The  subject's  task  was  to  identify  which  stimuli  occurred.  When  subjects 
were  presented  with  the  combined  auditory-visual  stimuli,  the  subjects  typically  only 
responded  that  a  visual  light  flash  occurred,  and  usually  did  not  even  notice  that  an 
auditory  stimuli  (tone)  was  present.  Thus,  in  this  task,  the  findings  suggest  visual 
dominance  over  the  auditory  sense. 

In  a  study  investigating  the  perceived  duration  of  auditory  and  visual  intervals, 
Behar  et  al.  [BEHA74],  found  that  auditory  intervals  (white  noise)  were  consistently 
judged  to  be  about  20%  longer  than  visual  intervals  (light  from  a  neon  glow-lamp)  of  the 
same  duration.  This  finding  "...calls  attention  to  the  contribution  of  peripheral  variables 
and  indicates  that  they  must  not  be  ignored  in  accounting  for  psychophysical  judgments" 
[BEHA74]. 

Burrows  and  Solomon  [BURR75]  conducted  an  experiment  investigating  the 
ability  to  scan  auditory  and  visual  information  in  parallel.  Subjects  were  presented  with 
pairs  of  letters,  one  being  a  visually  presented  letter  and  the  other  being  an  aurally 
presented  letter.  The  pairs  of  letters  were  presented  simultaneously  or  sequentially.  The 
subjects'  efficiency  of  memory  retrieval  was  measured  in  both  conditions:  1) 
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simultaneously  presented  letters  or  2)  sequentially  presented  letters.  Their  results 

indicated  that: 

Parallel  scanning  is  possible  with  a  simultaneous  presentation  but  not  with  sequential 
presentation.  In  retrospect,  this  is  not  surprising.  The  simultaneous  condition  provides  the 
opportunity  for  two,  modality  specific,  continuous  records  of  the  auditory  and  visual 
stimuli,  unbroken  by  switches  to  another  modality.  In  the  sequential  condition,  the  record 
for  each  modality  must  contain  "dead  time"  whenever  a  switch  to  the  other  mode  of 
presentation  takes  place.  [BURR75] 

Egeth  and  Sager  [EGET77]  explored  the  locus  of  visual  dominance  over  audition 

in  which  subjects  responded  to  suprathreshold  stimuli  consisting  of  an  audio-only  tone,  a 

visual-only  light  flash,  and  a  combined  auditory-visual  tone-light  flash.  Their  findings 

suggest  that: 

...sensory  or  perceptual  processing  of  the  [auditory]  tone  is  not  affected  by  the  light, 
i.e.,  that  visual  dominance  is  nonsensory  in  locus  and  depends  on  the  relevance  of  the 
[visual]  light  stimulus.  This  interpretation  was  reinforced  by  other  findings  which  showed 
that  the  degree  of  visual  dominance  was  sensitive  to  the  probability  of  light,  tone,  and 
light-plus-tone  trials  and  to  instructions  to  attend  to  a  specific  modality,  but  was  not 
sensitive  to  the  intensity  of  the  light.  [EGET77] 

Jones  and  Kabanoff  [JONE75]  conducted  an  experiment  to  determine  if  eye 
movements  are  a  factor  in  auditory  localization.  Jones  and  Kabanoff  based  this  research 
on  the  hypothesis  that  "...intersensory  effects  depend  upon  anatomical  linkages  of  the 
different  sensory  areas  via  the  motor  cortex,  which  may  serve  to  integrate  neural  activity 
by  sampling  the  state  of  the  different  sensory  receptors"  [JONE75].  They  found  that 
auditory  localization  accuracy  is  increased  if  the  subject  moves  his  eyes  in  the  direction 
of  the  intended  target.  Their  findings  suggest  that  "...voluntary  eye  movement  rather  than 
a  visual  map  is  likely  to  provide  the  framework  for  spatial  judgments"  [JONE75]. 

McGurk  and  MacDonald  [MCGU76]  investigated  the  effect  of  seeing  certain  lip 
movements  associated  with  hearing  contradictory  speech  sounds.  Subjects  were  presented 
auditory-only  speech  sounds  and  mismatched  auditory- visual  (speech-lip  movements) 
combinations.  Their  results  were  remarkable.  During  the  combined  auditory-visual 
mismatches,  most  subjects  were  convinced  they  were  hearing  what  they  were  seeing  (lip 
movements),  when  in  fact  the  lip  movements  were  not  the  correct  lip  movements  for  the 
associated  speech  sound  that  they  were  hearing.  Furthermore,  even  if  one  has  prior 


v> 


knowledge  of  the  auditory-visual  mismatches,  it  does  not  preclude  one  from  being 
convinced  they  were  hearing  what  they  were  seeing  (incorrectly).  The  results  of  this 
experiment  were  so  strong  that  it  is  commonly  referred  to  as  the  McGurk  Effect.  It  is 
interesting  to  note  that  "...the  sight  of  lip  movement  actually  modifies  activity  in  the 
auditory  cortex.  By  whatever  mechanisms  the  visual  cue  actually  enhances  the  processing 
of  auditory  inputs,  it  is  the  functional  equivalent  of  altering  the  signal-to-noise  ratio  of  the 
auditory  stimulus  by  15-20  decibels..."  [STEI93]. 

Rosenblum  and  Fowler  [ROSE91]  investigated  if  loudness  judgements  of  speech 
are  more  closely  related  to  the  visual  degree  of  exerted  vocal  effort  than  to  the  actual 
emitted  acoustical  properties  of  intensity.  As  in  the  McGurk  Effect,  subjects  were 
presented  conflicting  audio-visual  stimuli.  Their  findings  suggest  that  when  making 
loudness  judgements  of  speech,  the  visual  cues  of  vocal  effort  significantly  outweigh  the 
cues  provided  by  the  appropriate  levels  of  acoustic  intensity. 

Massaro  and  Warner  [MASS77]  conducted  an  experiment  which  investigated 
divided  attention  between  auditory  and  visual  perception.  In  their  experiment,  subjects 
were  asked  to  recognize  test  tones  and  test  letters  under  selective  and  divided  attention. 
They  concluded  that  "...the  degree  of  capacity  limitations  and  attentional  control  during 
visual  and  auditory  perception  is  small  but  significant"  [MASS77]. 

Hanson  [HANS81]  conducted  an  experiment  to  investigate  if  common  processing 
of  semantic,  phonological,  and  physical  systems  were  involved  during  reading  and 
listening.  Subjects  were  simultaneously  presented  two  words,  one  visually  and  one 
aurally,  but  were  instructed  to  attend  to  only  one  modality  and  to  make  responses  based 
on  that  attended  modality.  Her  results  indicated  that  the  unattended  words  had  an 
influence  on  semantic  and  phonological  decisions,  but  had  no  influence  on  the  physical 
task.  (In  the  physical  task,  the  visual  words  were  presented  in  either  small  or  capital 
letters  and  the  aural  words  were  presented  in  either  a  male  or  female  voice.)  Hanson 
concludes  that  the  written  and  spoken  words  "share  semantic  and  phonological 
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processing  but  have  separate  modality-specific  codes  that  operate  on  information  prior  to 
the  convergence  of  information  from  visual  and  auditory  inputs"  [HANS81]. 

G.       AUDITORY-VISUAL  THRESHOLD  PERCEPTION 

The  body  of  evidence  presented  thus  far  clearly  indicates  that  under  certain 
conditions,  auditory-visual  perceptual  phenomena  do  exist.  In  fact,  most  auditory-visual 
research  has  focused  on  threshold  levels,  absolute  sensitivity,  or  just-noticeable- 
differences  (JND).  Gilbert  [GILB41]  and  Ryan  [RYAN40]  independently  conducted 
exhaustive  literature  surveys  covering  these  topics  and  a  summary  of  their  findings  was 
presented  earlier  in  Sensory  Interaction  (Chapter  II,  Section  C).  Additional  evidence 
supporting  auditory-visual  perceptual  phenomena  from  threshold  level  stimuli  can  be 
found  in  the  following  references:  [SERR35]  [PRAT36]  [LOND54]  [THOM58] 
[LOVE70].  Nevertheless,  for  a  better  understanding  of  this  type  of  research,  the  findings 
of  two  experiments  are  presented  showing  auditory-visual  perceptual  phenomena  from 
threshold-level  stimuli. 

An  example  of  the  research  reviewed  by  Gilbert  and  Ryan  is  that  of  Kravkov 
[KRAV36],  one  of  the  early  pioneers  in  the  area  of  intersensory  experimentation. 
Kravkov's  experiment  investigated  the  influence  of  sound  upon  the  light  and  color 
sensitivity  of  the  eye.  In  this  experiment  three  female  subjects  were  presented  an  auditory 
stimulus  consisting  of  a  2100  Hz  tone  at  100  decibels  for  a  duration  of  about  10  minutes. 
During  these  10  minutes,  measurements  were  made  of  color  and  light  sensitivity.  The 
results  are  as  follows: 

1.  The  rod  sensibility  of  the  eye  decreases  under  the  influence  of  simultaneous  sound. 

2.  The  colour  sensibility  of  the  eye  changes  differently  under  the  influence  of  sound, 
according  to  the  wavelength  of  the  stimulating  light.  ...Whereas  the  colour  sensibility  for 
green  rises  during  the  acoustic  stimulation  the  colour  sensibility  for  orange-red  decreases. 
fKRAV36] 

In  1952,  Gregg  and  Brogden  [GREG52]  conducted  an  experiment  on  the  effect  of 

simultaneous  visual  stimulation  on  absolute  auditory  sensitivity.  In  their  experiment 
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subjects  were  presented  an  auditory  tone  along  with  an  auxiliary  light  source.  Their 
results  indicate  that  when  subjects  were  asked  to  report  the  presence  of  a  visual  light 
source  along  with  an  auditory  tone,  the  light  stimulus  decreased  subject  sensitivity  to  a 
1000  Hz  tone.  However,  when  subjects  were  only  required  to  report  the  presence  of  an 
auditory  tone,  the  light  stimulus  increased  sensitivity  to  the  auditory  tone. 

H.       AUDITORY-VISUAL  SUPRATHRESHOLD  PERCEPTION 

This  section  presents  the  motivation  and  findings  of  those  experiments  in  which 
suprathreshold  auditory  stimuli  influenced  visual  perceptual  quality,  fidelity,  or 
resolution;  and/or  suprathreshold  visual  stimuli  influenced  auditory  perceptual  quality, 
fidelity,  or  resolution.  These  experimental  findings  are  of  primary  interest  and  directly 
support  the  motivation  for  this  dissertation. 

1.  Motivation 

When  one  talks  about  the  using  both  audio  and  visual  displays  for  some  kind  of 
simulation,  game,  VE,  etc.,  some  people  will  say  that  the  use  of  high  quality  sound 
positively  influences  their  perception  of  the  visual  images.  For  example,  Brenda  Laurel 
states  that:  "...in  the  game  business  we  discovered  that  really  high-quality  audio  will 
actually  make  people  tell  you  that  the  games  have  better  pictures,  but  really  good  pictures 
will  not  make  audio  sound  better;  in  fact,  they  make  audio  sound  worse"  [TIER93].  Why 
is  this?  The  reason  is  probably  because  simulations,  games,  VEs,  etc.,  all  started  out  as 
having  only  visuals,  and  then  added  sounds  later.  The  addition  of  the  sounds,  then,  adds 
to  the  overall  perception  of  the  experience.  As  a  result,  the  visuals  appear  better.  It  is  also 
interesting  to  note  that  the  reverse  is  usually  never  reported,  that  the  use  of  high-quality 
visual  images  positively  influences  perception  of  auditory  displays.  Why  is  this?  Again, 
the  answer  is  probably  because  we  are  used  to  games  based  on  the  visual  displays. 
However,  if  games  started  out  as  audio  only  and  then  added  visuals  later,  then  perhaps, 
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the  addition  of  high-quality  visual  displays  might  positively  influence  subject  perception 

of  the  visual  images.  Unfortunately,  few  examples  exist  to  help  analyze  this  hypothesis. 

As  described  earlier  in  Sensory  Interaction  (Chapter  II,  Section  C),  there  are 

various  theories  about  sensory  interaction.  In  terms  of  auditory-visual  sensory  interaction, 

in  particular,  studies  of  infants  have  revealed  evidence  that  there  exists  a: 

...spatially  organized,  functional  relation  between  auditory  and  oculomotor  systems 
from  birth.  This  coordination  may  be  enhanced  by  intrinsic  spatial  properties  of  the  visual 
system  that  act  to  ensure  auditory  and  visual  colocation.  Such  a  functional  relation  might 
in  turn  facilitate  the  detection  of  intermodal  equivalence,  since  sounds  are  usually 
accompanied  by  sights.  [BUTT81] 

Stein  and  Meredith  theorize  that  "combinations  of,  for  example,  visual  and  auditory  cues 
can  enhance  one  another  and  can  also  eliminate  any  ambiguity  that  might  occur  when 
cues  from  only  one  modality  are  available"  [STEI93].  Murch  believes  that  "under  many 
conditions  the  encoding  of  strictly  visual  material  or  strictly  auditory  material  involves 
the  use  of  short-term  storage  of  both  systems"  [MURC73].  Since  auditory  and  visual 
displays  can  influence  each  other,  then  as  Durand  Begault  suggests,  "...another  solution 
for  improving  the  immersivity  and  perceived  quality  of  a  visual  display  and  the  virtual 
simulation  in  general  is  to  focus  on  other  perceptual  senses  -  in  particular,  sound" 
[BEGA94].  For  example,  Negroponte  recounts  the  following  story  of  designing  military 
tank  simulators: 

In  the  design  of  military  tank  trainers,  considerable  effort  was  made  to  have  the 
highest  achievable  display  quality  (at  almost  any  cost),  so  that  looking  at  the  display  was 
as  close  to  looking  out  the  window  of  a  tank  as  possible.  Fine.  Only  after  painstaking 
endeavors  to  keep  increasing  the  number  of  scan  lines  did  the  designers  think  to  introduce 
an  inexpensive  motion  platform  that  vibrated  a  little.  By  further  including  some 
additional  sensory  effects  --  tank  motor  and  trend  sounds  —  so  much  realism  was 
achieved  that  the  designers  were  then  able  to  reduce  the  number  of  scan  lines;  they 
nonetheless  exceeded  the  requirement  that  the  system  look  and  feel  real.  [NEGR95] 

However,  the  empirical  evidence  supporting  how  auditory  and  visual  displays  can 

influence  the  quality  perception  of  each  other  is  lacking.  One  reason  for  the  lack  of 

empirical  evidence  is  that  "...the  first  problem  in  comparing  vision  and  hearing  is  of 

specifying  perceptually  relevant  dimensions  for  both  modalities,  a  problem  which  still 

resists  truly  satisfactory  solution"  [JONE81].  Nevertheless,  after  an  exhaustive  literature 
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review,  the  following  experiments  present  the  only  findings  in  which  auditory  displays 
influenced  the  quality  perception  of  visual  displays  or  visual  displays  influenced  the 
quality  perception  of  auditory  displays. 

2.  Experimental  Results 

W.  Russell  Neuman  [NEUM90]  [NEUM91]  conducted  an  experiment  to  measure 
the  effect  of  changes  in  audio  quality  on  visual  perception  on  High-Definition  Television 
(HDTV).  The  experimental  design  was  to  keep  the  quality  of  the  visual  stimuli  constant, 
while  only  manipulating  the  auditory  stimuli.  The  auditory  conditions  were  as  follows: 
low  fidelity  (very  low-quality  speaker  system)  vs.  high  fidelity  (very  high-quality  speaker 
system);  monaural  vs.  stereo;  and  three  types  of  television  programming:  sports,  situation 
comedy,  and  action-adventure.  Subjects  were  presented  a  short  video  clip  along  with  one 
of  the  auditory  conditions.  The  subjects  were  then  asked  to  rate  1)  their  liking,  2)  their 
level  of  interest,  3)  their  psychological  involvement  in  the  programming,  4)  picture 
quality,  and  5)  audio  quality.  Their  results  indicated  that  subjects  "...had  a  difficult  time 
distinguishing  mono  from  stereo  and  even  low-fidelity  from  high-fidelity  sound.  ...[and] 
video  with  better  quality  and  stereo  sound  were  consistently  rated  as  more  likable, 
interesting,  and  involving"  [NEUM91].  Perhaps  the  most  interesting  finding  was  that  a 
few  subjects  perceived  an  increase  in  visual  quality  when  coupled  with  better  audio  even 
though  the  visual  quality  remained  constant  throughout  the  experiment.  This  finding, 
however,  was  not  statistically  significant  and  it  only  occurred  in  one  of  the  three 
presented  types  of  television  programming. 

Iwamiya  [IWAM92]  investigated  the  effect  of  visual  information  on  the 
impression  of  sound  and  the  effect  of  auditory  information  on  the  impression  of  visual 
images  when  listening  to  music  via  audio-visual  media.  The  factors  used  to  evaluate  the 
impression  of  both  audio  and  visual  images  were:  tightness,  evaluation,  brightness, 
uniqueness,  and  cleanness.  "These  factors  are  considered  to  be  the  intermodalities 
between  auditory  and  visual  processing"  [IWAM92].  Iwamiya  found  that  the  factors  of 
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brightness,  tightness,  and  cleanness  of  the  auditory  images  enhanced  the  perception  of 
brightness,  tightness,  and  cleanness  of  the  visual  images.  Iwamiya  concludes  that:  "The 
better  the  matching  of  sound  and  image,  the  higher  the  evaluation  of  auditory  and  visual 
impression.  This  kind  of  synagetic  interaction  is  controlled  by  the  feedback  loop  from  the 
total  integrated  impression  of  auditory  in  visual  information."  [IWAM92] 

Hollier  and  Voelcker  [HOLL97]  conducted  an  experiment  investigating  the 
influence  of  video  quality  on  audio  perception.  Thirty-two  subjects  watched  video  clips 
10  seconds  in  duration  with  supporting  audio  (speech)  commentaries.  In  total  there  were 
eight  video  quality  variations  and  four  audio  quality  variations.  Their  results  indicated 
that  1)  when  no  video  was  present,  the  perceived  audio  quality  was  always  worse  than  if 
video  was  present,  and  2)  although  only  small  differences  were  noted,  a  decrease  in  video 
quality  corresponded  to  a  decrease  in  perceived  audio  quality.  They  ultimately  propose  an 
algorithmic  approach  for  the  proper  development  of  an  auditory-visual  cross-modal 
perceptual  model  depicted  in  Figure  29.  In  their  final  discussion  of  the  experiment, 
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Figure  29.  Auditory-Visual  Perceptual  Model  From  [HOLL97]. 
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Hollier  and  Voelcker  state  that  "for  a  majority  of  applications  both  in  the 
communications  and  entertainment  industry  separate  evaluation  of  audio  or  video  quality 
is  likely  to  become  of  limited  value"  [HOLL97]. 

Two  companion  papers  by  Woszczyk  et  al.  [WOSZ95]  and  Bech  et  al.  [BECH95] 
discuss  the  design  and  results  of  an  experimental  procedure  examining  the  interaction 
between  the  auditory  and  visual  modalities  in  the  context  of  a  home  theater  system.  Their 
approach  acknowledges  that  "...experiments  involving  both  modalities  require  a  novel 
approach  that  recognizes  domains  of  cooperative  interaction  between  the  senses" 
[WOSZ95].  With  the  growing  interest  and  development  of  virtual  reality  systems, 
Woszczyk  identifies  the  need  for  testing  the  interaction  of  audio  and  visual  displays  in 
order  to  bring  about  "substantial  improvements  in  the  integration  of  various  audio  and 
video  parts  of  these  [virtual  reality]  systems,  and  thereby  provide  important  perceptual 
benefits  that  enhance  [the]  audio-visual  experience  of  the  viewers"  [WOSZ95].  The 
testing  of  audio-visual  interaction  is  critical  because  "Auditory  and  visual  channels  work 
both  independently  and  in  mutual  cooperation  on  both  cognitive  and  sensory  levels  of 
perception,"  [WOSZ95].  In  order  to  study  the  interaction  between  the  audio  and  visual 
sensory  modalities  "it  is  necessary  to  focus  on  the  total  experience  and  not  on  the  two 
modalities  individually"  [BECH95],  which  supports  Woszczyk  et  al.'s  observations  that 
"The  matching  of  auditory  and  visual  data  triggers  perceptual  synergy  between 
modalities  and  promotes  intermodal  fusion"  [WOSZ95].  In  their  experiments,  subjects 
assessed  audio-visual  reproductions  using  the  subjective  dimensions  of  action,  space, 
mood,  and  motion  while  asking  specific  questions  focusing  on  quality,  magnitude,  degree 
of  involvement,  and  audio-visual  balance.  Quality  was  defined  as:  distinctness,  clarity, 
and  detail  of  the  impression.  One  of  their  findings,  of  particular  interest  is  that  both  visual 
and  audio  perceived  quality  increased  with  increasing  screen  size.  To  further  explore 
auditory-visual  interaction,  Bech  conducted  two  more  experiments  to  investigate  the 
influence  of  stereophonic  (audio)  width  on  the  perceived  quality  of  an  audio-visual 
presentation  using  multichannel  surround  sound  systems.  During  the  experiments,  the 
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subjects  were  asked  to  evaluate  the  quality  (fidelity)  of  the  spatial  information  contained 
in  audio-visual  reproductions.  The  results  indicate  that  "the  quality  of  [perceived]  spatial 
reproduction  increases  linearly  with  an  increase  in  the  stereophonic  [audio]  width" 
[BECH97]. 

Hugonnet  [HUG097]  presents  what  he  considers  to  be  a  new  concept  of  spatial 
coherence  between  sound  and  picture  in  stereophonic  TV  production.  "From  a  cultural 
and  historical  point  of  view,  our  perception  of  sound  corresponding  to  image  has 
remained  monophonic"  [HUG097].  As  such,  Hugonnet  describes  methods  of  production 
and  post-production  to  achieve  spatial  coherence  of  stereo  sound  with  various  TV  content 
including:  talk  shows  with  two  people,  talk  shows  with  more  than  two  people,  concerts, 
sports,  and  drama.  He  found  that  when  people  are  first  exposed  to  stereo  sound  when 
watching  TV,  people  found  the  relation  between  visual  and  auditory  images  strange  and 
not  very  comfortable.  However,  once  people  became  accustomed  to  the  stereo  sound,  if 
they  were  re-exposed  to  mono  sound,  they  perceived  the  quality  of  the  mono  sound  to  be 
of  lower  sound  quality.  Hugonnet  concludes  by  recognizing  the  importance  of  auditory- 
visual  interaction  and  states:  "It  is  up  to  us  to  bring  about  a  radical  change  in  audiovisual 
perception,  where  sound  will  gain  its  right  place,  on  a  par  with  the  visual  image" 
[HUG097]. 

I.         SUMMARY 

In  summary,  this  chapter  has  provided  an  overview  of  Virtual  Environments, 
Auditory-Visual  Perceptual  Organization,  Auditory-Visual  Art  Forms  and  Film, 
Auditory-Visual  Cross-Modal  Matching,  Visual  Dominance  over  Audition,  Auditory- 
Visual  Threshold  Perception,  and  Auditory-Visual  Suprathreshold  Perception. 
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IV.  EXPERIMENTAL  DESIGN  OVERVIEW 

A.  INTRODUCTION 

This  chapter  describes  the  motivation  and  initial  considerations  that  led  to  the 
development  of  the  experimental  design  used  to  gather  empirical  evidence  supporting 
suprathreshold  auditory-visual  cross-modal  quality  perception  phenomena.  The  various 
considerations  outlined  in  this  chapter  were  instrumental  in  developing  the  experimental 
design  of  the  pilot  study  which  ultimately  led  to  the  three  main  experiments  forming  the 
foundation  of  this  dissertation.  The  experimental  design  details  of  the  pilot  study  and 
three  main  experiments  are  described  in  greater  detail  in  the  next  four  chapters.  Thus,  the 
intent  of  this  chapter  is  not  to  focus  on  details,  but  rather  to  provide  an  overview  of  the 
choices  that  were  considered  during  the  initial  experimental  design  development. 

B.  MOTIVATION 

Based  on  the  findings  from  the  exhaustive  background  and  literature  review 
outlined  in  the  previous  two  chapters,  the  following  are  some  key  observations: 

1)  There  is  neurological  and  physiological  evidence  supporting  auditory-visual 
cross-modal  perception  phenomena. 

2)  There  is  psychological  and  psychophysical  evidence  supporting  auditory-visual 
cross-modal  perception  phenomena. 

3)  There  is  empirical  evidence  supporting  the  ability  to  divide  attention  between 
audition  and  vision. 

4)  There  is  empirical  evidence  suggesting  that  sound  can  influence  the  perceived 
mood  of  motion  pictures. 

5)  There  is  empirical  evidence  supporting  auditory-visual  cross-modal  perception 
phenomena  concerning  increased  sensitivity/acuity  in  audition  and/or  vision. 

6)  There  is  a  need  to  enhance  multimedia  and  VE  development  through  better 
understanding  of  auditory-visual  cross-modal  perception  phenomena. 
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7)  There  is  a  lack  of  empirical  evidence  supporting  auditory-visual  cross-modal 
perception  phenomena  in  which  suprathreshold  auditory  stimuli  influenced  visual 
perceptual  quality  and  suprathreshold  visual  stimuli  influenced  auditory  perceptual 
quality. 

Based  on  these  key  observations,  which  stem  from  wide-ranging  interdisciplinary 
research,  there  is  a  need  for  empirical  evidence  supporting  suprathreshold  auditory-visual 
cross-modal  quality  perception  phenomena.  The  ultimate  goal  of  this  dissertation  answers 
the  following  question:  In  an  audio-visual  display,  what  affect  (if  any)  do  various  audio 
quality  levels  have  on  the  perception  of  visual  quality  and  various  visual  quality  levels 
have  on  the  perception  of  auditory  quality?  The  following  are  some  specific  derivations 
of  this  question: 

1)  Are  changes  in  the  audio  and/or  visual  qualities  of  an  audio-visual  display 
perceivable  and  can  these  changes  be  attended  to  also? 

2)  Does  a  high-quality  auditory  display  coupled  with  a  low-quality  visual  display 
cause  a  decrease/increase  in  the  perception  of  audio  quality  and/or  an  increase/decrease  in 
the  perception  of  visual  quality  relative  to  established  baseline  conditions  derived  from 
auditory-only  and  visual-only  quality  perception  evaluations? 

3)  Does  a  low-quality  auditory  display  coupled  with  a  high-quality  visual  display 
cause  an  increase/decrease  in  the  perception  of  audio  quality  and/or  a  decrease/increase  in 
the  perception  of  visual  quality  relative  to  established  baseline  conditions  derived  from 
auditory-only  and  visual-only  quality  perception  evaluations? 

4)  Does  a  low-quality  auditory  display  coupled  with  a  low-quality  visual  display 
cause  a  decrease/increase  in  the  perception  of  audio  quality  and/or  a  decrease/increase  in 
the  perception  of  visual  quality  relative  to  established  baseline  conditions  derived  from 
auditory-only  and  visual-only  quality  perception  evaluations? 

5)  Does  a  high-quality  auditory  display  coupled  with  a  high-quality  visual  display 
cause  an  increase/decrease  in  the  perception  of  audio  quality  and/or  an  increase/decrease 
in  the  perception  of  visual  quality  relative  to  established  baseline  conditions  derived  from 
auditory-only  and  visual-only  quality  perception  evaluations? 

In  order  to  answer  these  questions  concerning  auditory-visual  perceptual 
phenomena,  the  approach  taken  was  to  conduct  an  experiment  to  facilitate  measuring 
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responses  to  various  auditory-visual  suprathreshold  stimuli.  The  overall  design  of  the 
experiment  consists  of  three  main  portions:  1)  visual-only  displays,  2)  auditory-only 
displays,  and  3)  combined  auditory-visual  displays.  During  the  visual-only  portion, 
subjects  are  presented  visual  displays  and  are  then  asked  to  rate  their  visual  quality. 
During  the  auditory-only  portion,  subjects  are  presented  auditory  displays  and  are  then 
asked  to  rate  their  auditory  quality.  During  the  combined  auditory-visual  portion,  subjects 
are  presented  combined  auditory-visual  displays,  and  are  then  asked  to  rate  the  quality  of 
both  the  auditory  portion  and  visual  portion  of  the  combined  auditory-visual  display.  The 
goal  is  to  compare  the  subject's  quality  ratings  made  during  the  visual-only  and  auditory- 
only  portions  with  the  subject's  visual  and  auditory  quality  ratings  made  during  the 
combined  auditory-visual  portion.  The  results  of  this  comparison  are  analyzed  to  answer 
the  questions  of  interest,  and  as  such  are  the  quintessential  contribution  of  this 
dissertation.  The  initial  design  considerations  of  this  experiment  are  now  presented. 

C.       DESIGN  CONSIDERATIONS 

1.  Software  and  Hardware 

The  first  key  consideration  in  the  experimental  design  is  that  the  experiment  be 
automated.  The  goal  is  to  create  a  computer  program  that  can  render  visual-only, 
auditory-only,  and  combined  auditory-visual  displays  while  also  capturing  the  required 
responses  of  the  subject.  An  automated  experiment  is  chosen  since  it  helps  to  produce 
identical  testing  conditions,  thereby  reducing  any  potential  confounds  (i.e.,  confounding 
factors)  that  might  arise  through  human  error.  Keeping  in  mind  the  self-imposed 
limitations  described  earlier  in  LIMITATIONS  (Chapter  I,  Section  E),  the  software 
chosen  for  the  experiment  consisted  of  HTML,  Java,  JavaScript,  and  VRML  (all  freely 
downloadable).  The  basic  idea  is  to  have  the  entire  experiment  contained  within  an 
HTML  browser  window  as  depicted  in  Figure  30.  The  visual-only,  auditory-only,  and 
combined  auditory-visual  displays  could  then  be  rendered  via  JavaScript  and/or  VRML 
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Figure  30.  Netscape  HTML  Browser  Window. 

within  the  main  HTML  window.  The  subjects'  responses  are  then  obtained  with  rating 
scales  using  Java  pop-up  windows  as  depicted  in  Figure  31.  Furthermore,  based  on  the 
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Figure  31.  Java  Pop-up  Visual  Display  Rating  Scale. 

software  utilized,  and  keeping  in  mind  the  limitations  of  this  dissertation,  a  personal 
computer  (PC)  was  used  for  all  experiments.  The  specifics  of  the  software  and  hardware 
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used  are  explained  in  greater  detail  during  the  description  of  the  pilot  study  and  the  three 
main  experiments  in  subsequent  chapters. 

2.  Visual  Displays 

Important  considerations  in  the  development  of  this  experiment  include  choosing 
the  rendering,  type/content,  and  quality  manipulation  parameters  of  the  visual  displays. 
The  possible  rendering  choices  of  the  visual  displays  considered  were:  17-inch  computer 
monitor,  20/21 -inch  computer  monitor,  28-inch  computer  monitor,  large  screen  TV,  and 
triple  large-screen  TVs.  Because  of  fidelity  considerations  and  amount  of  available 
controlled  laboratory  space,  the  TVs  were  not  utilized.  The  high  cost  of  the  28-inch 
monitor  precluded  its  use,  and  the  17-inch  monitor  proved  to  be  too  small.  As  a  result,  a 
20-inch  computer  monitor  was  selected  to  render  all  the  visual  displays. 

Choosing  the  type  and  content  of  the  visual  display  was  perhaps  the  most  difficult 
task  during  the  development  of  the  experiment.  Possible  types  of  visual  displays 
considered  included:  static  (still  image)  or  dynamic  (motion  video,  user  controlled 
navigation  in  2D  space,  or  user  controlled  navigation  in  3D  space).  To  reduce  the 
excessive  computational  requirements  of  motion  video,  to  reduce  frame  rate 
synchronization  errors  with  associated  auditory  displays,  and  to  reduce  user-computer 
interaction  training  and  variations  associated  with  user  controlled  navigation,  static 
images  were  chosen  as  the  display  type.  Once  the  decision  was  made  to  use  static  visual 
displays,  the  next  difficult  task  was  to  choose  the  content.  After  considering  numerous 
possibilities,  two  visual  displays  were  chosen:  1)  a  radio  and  2)  scene  depicting  a  bowl  of 
fruit  and  flowers.  Figure  32  and  Figure  33  depict  (in  color)  the  radio  and  fruit-flower 
scene  respectively.  The  rationale  for  the  choice  of  content  of  these  displays  will  be 
explained  in  greater  detail  during  the  description  of  the  pilot  study  and  three  main 
experiments  in  subsequent  chapters. 

Once  the  choice  of  rendering  and  type/content  of  the  visual  displays  were 
determined,  the  quality-manipulation  parameters  were  selected.  Since  the  results  of  this 
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Figure  32.  Color  Visual  Display  of  Radio. 


Figure  33.  Color  Visual  Display  of  Fruit-Flower  Scene. 


74 


research  effort  will  hopefully  benefit  multimedia  and  VE  development,  pixel  resolution 
and  noise  level  were  chosen  as  the  quality  parameters  to  be  manipulated.  Selecting  pixel 
resolution  is  perhaps  the  most  prevalent  decision  in  creating  visual  scenes  for  any  VE. 
Increasing  pixel  resolution  corresponds  to  an  increase  in  realism  at  the  expense  of  1)  an 
increase  in  rendering  time,  2)  an  increase  in  storage  requirements,  and  3)  an  increase  in 
download  time  (if  networked).  Thus,  the  VE  developer  must  carefully  consider  the 
amount  of  required  pixel  resolution.  Noise  level,  the  other  parameter,  was  chosen  based 
on  similar  considerations  as  pixel  resolution  when  one  considers  quality  levels  of  MPEG 
video.  High-quality  MPEG  video  has  a  greater  signal-to-noise  ratio  than  low-quality 
MPEG  video.  Thus,  a  lower-quality  visual  image  will  have  a  greater  noise  level  than  that 
of  a  higher  quality  image.  Another  factor  for  using  noise  level  was  based  on  the  visual 
display's  eventual  coupling  with  an  auditory  display  which  is  explained  in  the  next 
section.  A  final  consideration  in  the  choice  of  visual  displays  was  the  ability  to  produce 
the  various  required  quality  levels.  For  example,  if  a  potential  quality  metric  cannot  be 
produced  due  to  software  or  hardware  constraints,  then  that  quality  metric  is  not  feasible. 
Since  Adobe  Photoshop  [ADOB98]  was  utilized,  its  capabilities  provided  the  limits  of 
possible  quality  parameter  manipulation.  As  such,  all  the  visual  displays  used  throughout 
all  the  experiments  were  developed  using  Adobe  Photoshop. 

3.  Auditory  Displays 

Equally  important  considerations  in  the  development  of  this  experiment  were 
choosing  the  fidelity,  rendering,  content,  and  quality  manipulation  parameters  of  the 
auditory  displays.  The  possible  fidelity  choices  of  the  auditory  displays  considered  were: 
monophonic,  stereophonic,  and  spatialized.  The  rendering  possibilities  of  the  auditory 
displays  considered  were:  headphones,  left  and  right  small-computer  speakers,  left  and 
right  high-fidelity  speakers,  quad  configuration  of  high-fidelity  speakers,  and  surround- 
sound  configuration  of  high-fidelity  speakers.  In  order  to  minimize  any  potential 
experimental  confounds  due  to  varying  room  acoustics,  headphones  were  chosen  to 
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render  the  auditory  displays.  Similarly,  to  minimize  any  unforeseen  confounds  from 
using  stereophonic  or  spatializcd  sound,  monophonic  fidelity  was  chosen  for  all  auditory 
displays.  Another  factor  for  choosing  monophonic  audio  fidelity  was  due  to  the  static 
nature  of  the  visual  displays.  Once  the  decision  was  made  to  use  monophonic  auditory 
displays,  the  next  difficult  task  was  to  choose  the  content.  After  numerous  possibilities,  a 
music  sound  was  chosen  as  the  content  of  the  auditory  displays.  The  rationale  for  using 
music  as  the  content  of  the  auditory  displays  will  be  explained  in  greater  detail  during  the 
description  of  the  pilot  study  and  three  main  experiments  in  subsequent  chapters.  Once 
the  choice  of  fidelity,  rendering  and  content  of  the  auditory  displays  were  determined,  the 
quality  manipulation  parameters  were  selected. 

As  stated  earlier,  since  the  results  of  this  research  effort  will  hopefully  benefit 
multimedia  and  VE  development,  sampling  frequency  and  noise  level  were  chosen  as  the 
quality  parameters  to  be  manipulated.  The  choice  of  sampling  frequency  is  similar  to  that 
of  pixel  resolution.  Increasing  sampling  frequency  corresponds  to  an  increase  in  realism 
at  the  expense  of  1)  an  increase  in  rendering  time,  2)  an  increase  in  storage  requirements, 
and  3)  an  increase  in  download  time  (if  networked).  Thus,  the  VE  developer  must 
carefully  consider  sampling  frequencies.  Noise  level,  the  other  parameter,  was  chosen 
because  signal-to-noise  ratio  is  another  common  quality  metric  of  audio.  The  amount  of 
noise  level,  specifically  Gaussian  noise,  was  also  chosen  because  of  the  eventual  coupling 
of  auditory  to  visual  displays  with  varying  noise  levels.  As  such,  the  level  of  Gaussian 
noise  becomes  a  common  quality  metric  between  both  auditory  and  visual  displays  as 
will  be  explained  in  greater  detail  during  the  description  of  the  main  experiments  in  the 
subsequent  chapters.  As  with  the  visual  displays,  a  final  consideration  in  the  choice  of 
auditory  displays  was  the  ability  to  produce  the  various  required  quality  levels.  For 
example,  if  a  potential  quality  metric  cannot  be  produced  due  to  software  or  hardware 
constraints,  then  that  quality  metric  is  not  feasible.  Since  Sonic  Foundary's  Sound  Forge 
software  [SONI98]  was  utilized,  its  capabilities  provided  the  limits  of  possible  quality 
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parameter  manipulation.  As  such,  all  the  auditory  displays  used  throughout  all  the 
experiments  were  developed  using  Sound  Forge. 

4.  Location  and  Subjects 

The  location  for  conducting  all  experiments  was  at  the  Naval  Postgraduate  School 
(NPS)  in  Monterey.  California.  To  limit  external  environmental  noises  and  to  control 
distractions,  all  experiments  were  conducted  within  an  isolated  room  (office)  in  which  the 
experimenter  had  total  control  of  audio  and  visual  conditions.  As  such,  scheduling 
conflicts  typically  associated  with  the  main  laboratory  were  eliminated,  which  greatly 
facilitated  the  process  of  running  experiment  sessions.  Furthermore,  since  all  experiments 
were  conducted  at  NPS,  the  NPS  student  body  provided  an  excellent  source  of  engaged 
and  attentive  volunteer  subjects. 

5.  Data  Analysis 

Another  important  consideration  in  the  experimental  design  was  that  of  the 
eventual  data  analysis  process.'  The  important  factor  was  that  the  data  collection  format 
had  to  mesh  with  the  data  analysis  process.  As  such,  a  considerable  amount  of  time  was 
spent  deciding  how  to  analyze  the  resulting  data  even  before  the  data  was  collected. 
Accordingly,  the  chosen  method  of  data  analysis  helped  to  derive  the  format  of  data 
collection.  Since  StatView  [SASI98]  software  was  chosen  to  do  the  statistical  analysis  of 
the  experimental  results,  the  data  collection  process  was  in  turn  automated  to  facilitate  the 
ease  of  importing  data  into  StatView. 

D.       DESIGN  SELECTIONS 

Based  on  the  motivation  and  initial  design  considerations,  a  pilot  study  was 
designed  to  investigate  the  perceptual  effects  from  manipulating  visual  display  pixel 
resolution  and  auditory  display  sampling  frequency.  The  visual  display  consisted  of  the 
aforementioned  radio,  and  the  auditory  display  was  a  selection  music.  The  entire 
automated  experiment  was  contained  within  an  HTML  browser  window  using  VRML  to 
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render  the  visual-only,  auditory-only,  and  combined  auditory-visual  displays,  and  using 
Java  pop-up  windows  to  collect  subject  responses.  The  details  of  the  experimental  design 
are  outlined  in  Chapter  VI.  The  lessons  learned  from  this  pilot  study  were  instrumental  in 
designing  the  three  main  experiments  of  this  dissertation  as  follows:  1 )  Experiment  1 : 
Static  Resolution,  2)  Experiment  2:  Static  Noise,  and  3)  Experiment  3:  Static  Resolution 
NonAlphanumeric.  Each  experiment  was  fully  automated  and  contained  within  an  HTML 
browser  window  using  JavaScript  to  render  the  visual-only,  auditory-only,  and  combined 
auditory-visual  displays,  and  using  Java  pop-up  windows  to  collect  subject  responses. 

As  its  name  implies,  Experiment  1:  Static  Resolution  is  designed  to  investigate 
the  perceptual  effects  from  manipulating  visual  (static  as  opposed  to  dynamic)  display 
pixel  resolution  and  auditory  display  sampling  frequency.  The  visual  display  consisted  of 
the  aforementioned  radio,  and  the  auditory  display  was  a  selection  music.  The  details  of 
the  experimental  design  are  outlined  in  Chapter  VII. 

Experiment  2:  Static  Noise  is  designed  to  investigate  the  perceptual  effects  from 
manipulating  visual  (static)  display  Gaussian  noise  level  and  auditory  display  Gaussian 
noise  level.  The  visual  display  consisted  of  the  aforementioned  radio,  and  the  auditory 
display  was  a  selection  music.  The  details  of  the  experimental  design  are  outlined  in 
Chapter  VIII. 

Experiment  3:  Static  Resolution  NonAlphanumeric  is  designed  to  investigate  the 
perceptual  effects  from  manipulating  visual  (static)  display  pixel  resolution  and  auditory 
display  sampling  frequency.  The  visual  display  consisted  of  the  aforementioned  fruit- 
flower  scene,  and  the  auditory  display  was  a  selection  music.  The  details  of  the 
experimental  design  are  outlined  in  Chapter  IX. 

E.        SOFTWARE  DESIGN 

In  order  to  better  understand  the  type  of  computer  programming  used  to  develop 
the  main  experimental  design,  a  brief  overview  of  the  software  design  and  development  is 
now  provided. 
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1.  Overview 

All  software  used  in  the  development  of  the  main  experimental  design  is  custom- 
designed  and  encapsulated  into  an  HTML  file.  For  each  main  experiment,  a  total  of  nine 
HTML  files  are  developed.  Each  HTML  file  corresponds  to  the  predetermined 
randomized  sequence  of  appropriate  auditory-only,  visual-only,  and  combined  auditory- 
visual  stimuli.  This  randomization  is  based  on  the  Latin  square  technique  (see  [GOOD95] 
for  a  description  of  the  Latin  squares  technique).  As  such,  to  initiate  an  experiment 
testing  session,  the  appropriate  HTML  file  is  simply  executed.  In  an  effort  to  minimize 
delays  in  rendering  any  of  the  auditory  or  visual  stimuli,  all  auditory  and  visual  displays 
(files)  were  pre-loaded  into  memory  as  the  HTML  file  is  being  executed  for  the  first  time. 

2.  Development 

The  development  of  the  overall  software  design  of  the  main  experiment  was 
divided  into  three  main  components:  1)  displaying  instructions,  2)  auditory  and  visual 
display  rendering,  and  3)  user  input. 

a.    Displaying  Instructions 

Since  the  experiment  is  to  be  automated,  the  user  (subject)  is  presented 
with  numerous  sets  of  instructions.  The  wording  of  the  various  sets  of  instructions  was 
fine-tuned  throughout  the  pilot  study  in  order  to  eliminate  any  possible  ambiguities.  All 
the  various  sets  of  instructions  were  written  as  separate  Java  applets  which  were  simply 
embedded  into  the  main  HTML  code.  As  such,  all  nine  HTML  files  shared  the  same  Java 
instruction  applets.  Thus,  if  any  one  set  of  instructions  needed  to  be  rewritten  for  clarity, 
only  that  one  set  of  instructions  had  to  be  rewritten  and  recompiled,  as  opposed  to 
rewriting  the  instructions  in  all  nine  HTML  files.  An  example  of  the  Java  programming 
code  used  to  produce  one  set  of  instructions  is  depicted  in  Figure  34. 
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import  netscape. javascript.  *; 

import  java. apple!.  *; 
import  java.awt.  *; 
import java.awt.  event    ; 

public  class  InstructionsAudioVisual  extends  Applet  implements  WindowListener, 

ActionListener  { 
private  Button  EnterButton; 
private  Panel  EnterPanel; 
private  Text  Area  Text; 
public  JSObject  win; 
public  void  init()  / 
Text  =  new  TextArea("\n",  9,  67,  3); 

Text.append("  ( I )  You  will  now  be  rating  the  VISUAL  quality  of  a  combined  audio-visual  display.\n"); 
Text. append! "  \n "); 

Text. append( "  (2)  A  total  of  9  audio-visual  displays  will  be  presented  randomly.\n "); 
Text. append( "  Vi "); 

Text. append( "  (3)  Each  audio-visual  display  will  be  presented  for  8  secondsSn "); 
Text. append! "  Vi "); 

Text. append!"  (4)  After  which,  you  will  be  prompted  ONLY  for  your  VISUAL  rating.\n"); 
Text. append! "  Vz "); 
EnterPanel  =  new  Panel!); 

EnterPanel.  set  Lay  out!  new  FlowLayout(  FlowLayout.  CENTER)); 
EnterButton  =  new  Button! "Press  to  Continue"); 
EnterButton.  addActionListenerf  this); 
EnterPanel. add!  EnterButton); 
GridBagLayout  gridbag  =  new  GridBagLayout(); 
GridBagConstraints  c  —  new  GridBagConstraints!); 
set  Font!  new  Font!  "Helvetica",  Font. PLAIN,  14)); 
setLay  out!  gridbag); 
c.fill  =  GridBagConstraints. BOTH; 

e.gridwidth  =  GridBagConstraints.  REMAINDER;  //end  row 
gridbag. setConstraintsf  Text,  c); 
addlText); 

e.gridwidth  -  GridBagConstraints. REMAINDER;  //end  row 
gridbag. setConstraints! EnterPanel,  c); 
add!  EnterPanel); 

e.gridwidth  =  GridBagConstraints. REMAINDER;  //end  row 
I  //end 
public  void  windowClosed!  Window  Event  event)  { 

I 

public  voidwindowDeiconified(WindowEvent  event)  { 

} 

public  void  windowlconifiedfWindowEvenl  event)  f 

I 

public  void  windowActivated(WindowEvent  event)  { 

I 

public  void  windowDeactivated(WindowEvent  event)  { 

I 

public  void  windowOpened(WindowEvent  event)  { 

I 

public  void  windowClosing(WindowEvent  event)  ( 
System,  gc!); 

} 

public  void  aclionP erf ormed! ActionEvent  event)  j 
Object  source  =  event. getSource!); 
if  (source  ==  EnterButton)  { 
win  =  JSObject.  getWindow(  this ); 
win.eval!  "audioVisualWrite! )"); 
win.eval!  "goToA  udio  VisualDisplaysl ) "); 
System.gd); 
}  //end  if 
}  //end  actionP  erf  ormed 
}//  end  Applet 

Figure  34.  Example  of  Java  Applet  used  to  Render  Instructions. 
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b.    Auditory  and  Visual  Display  Rendering 

All  auditory  and  visual  displays  were  rendered  via  JavaScript  function 
calls  within  the  main  embedded  HTML  file.  Figure  35  depicts  a  portion  of  the  JavaScript 
programming  code  used  to  render  three  combined  auditory-visual  displays.  Specifically, 
1)  function  HLC()  is  used  to  render  a  combined  high-quality  auditory  and  low-quality 
visual  display;  2)  function  HMC()  is  used  to  render  a  combined  high-quality  auditory  and 
medium-quality  visual  display;  and  3)  function  HHC()  is  used  to  render  a  combined  high- 
quality  auditory  and  high-quality  visual  display. 


function  HLC()  { 
tughWrileO; 
lowWrite(); 

document.  highSound.pla\( false); 
document. images/  "RenderDisplays"  J.src  =  lowVisual; 
goToCombinedDisplavs(); 

! 

function  HMC()  { 

highWrite(); 

medWrite(); 

document,  images!  "RenderDisplays  "].src  =  medVisual; 

document. highSound.play(false); 

°oToCombinedDispla\s( ) ; 
} 

function  HHC()  { 
highWrite(); 
highWrite(); 

document.  images[  "RenderDisplays  "}.src  =  high  Visual; 
document.  lughSound.playf false); 
%oToCombinedDispla\s(); 
/ 


Figure  35.  Example  of  JavaScript  Function  Calls. 

c.    User  Input 

All  user  input  is  accomplished  via  Java  Frames  which  contain  the 
appropriate  rating  scales. A  Frame  is  basically  a  window  which  can  be  made  to  appear 
and  disappear  (i.e.,  a  pop-up  window).  Figure  36  depicts  a  portion  of  the  Java 
programming  code  used  to  render  a  visual-only  rating  scale. 
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public  class  RatingScalesVisualAndRatingsTesl  extends  Frame  implements  WindowListener, 

ActionListencr 

! 

private  ShowRalingScalesVisualAndRalingsTest  thisScale; 

public  final  static  String  TITLE  =  "Visual  Display  Quality  Rating  Scale"; 

Checkbox  oneVjwoVjhreeVfourV.five VsixVseven V; 

Button  EnterButton; 

private  Panel  VisualPanel,  EnterPanel; 

public  RatingScalesVisualAndRatingsTest(ShowRatingScalesVisualAndRatingsTest  owner)  j 

super(TITLE); 

Panel  VisualPanel  =  new  Panel(); 

VisualPanel. sctLavout( new  FlowLayout(  FlowLavout.  CENTER)); 

VisualPanel.add{new  Label( "  <LOW>  ")); 

CheckboxGroup  VtsualGroup  =  new  CheckboxGroupf ); 

oneV  =  new  Checkbox(  "1 ",  VisualGroup,  false); 

VisualPanel.  add(  oneV); 

twoV  -  new  Checkbox("2",  VisualGroup,  false); 

VisualPanel. add(twoV); 

threeV  =  new  Checkbox("3",  VisualGroup,  false); 

VisualPanel.add(threeV); 

fourV  =  new  Checkbox("4",  VisualGroup,  false); 

VisualPanel. add(fourV); 

fiveV  =  new  Checkbox("5",  VisualGroup,  false); 

VisualPanel.  add(five  V) ; 

sixV  =  new  Checkbox("6",  VisualGroup,  false); 

VisualPanel.  addfsix  V); 

sevenV  —  new  Checkbox("7".  VisualGroup,  false); 

VisualPanel.  add{  seven  V); 

VisualPanel.addfnew  Label(  "<HIGH>  ")); 

EnterPanel  —  new  Panelf ); 

EnterPanel. setLavout( new  FlowLayoutf  FlowLavout. CENTER)); 

EnterButton  =  new  Button(  "Press  to  Continue  "); 

EnterButton. addActionListeneii  this); 

EnterPanel.  add(  EnterButton); 

setLayoutfnew  GridLayout(2,  1,  1,  3)); 

add(  VisualPanel); 

add{  EnterPanel); 

pack(); 

setLocationi  180,220); 

addWindowListener(this); 

thisScale  =  owner; 
I //end 

public  void  windowClosed(WindowEvent  event)  { 
I 

public  void  windowClosing(WindowEvent  event)  { 
dispose(); 
System. gc(); 

}    ' 

public  void  actionP erf ormed(ActionEvent  event)  { 
Object  source  -  event. getSourcef);  t 

if  (source  ==  EnterButton)  { 
thisScale.  myReturn( ); 
dispose(); 
System.  gc(); 
}  //end  if 
j  //end  actionPerformed 
J  //  end  Frame 


Figure  36.  Example  of  Java  Frame  used  to  Render  Rating  Scales. 
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F.        SUMMARY 

In  summary,  this  chapter  has  provided  an  overview  of  the  overall  experimental 
design  process  of  this  research  effort  to  include  its  motivation,  design  considerations, 
eventual  design  selections,  and  overall  software  design. 
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V.  VISUAL  AND  AUDITORY  DISPLAY  DEVELOPMENT 

A.  INTRODUCTION 

Given  that  the  pilot  study  is  designed  to  investigate  the  perceptual  effects  from 
manipulating  visual-display  pixel  resolution  and  auditory-display  sampling  frequency, 
the  required  associated  visual  and  auditory  displays  need  to  be  created.  The  visual  display 
selected  for  the  pilot  study  is  a  radio  (Chapter  IV,  Figure  32),  and  the  auditory  display  is 
a  selection  of  music.  The  rationale  for  choosing  a  radio  and  music  is  based  on  the 
eventual  coupling  of  the  auditory  and  visual  displays  to  form  a  combined  auditory-visual 
display.  Based  on  1 )  psychological  factors  such  as  Gestalt  perceptual  grouping  theory  and 
the  Ventriloquism  Effect,  and  2)  neurological  evidence  supporting  auditory-visual 
sensory  interaction,  an  auditory-visual  display  consisting  of  a  radio  and  music  might  be 
perceptually  grouped  together  thereby  producing  a  more  tightly  coupled  display. 
Furthermore,  in  a  higher  cognitive  sense,  we  are  likely  to  associate  music  (audio)  with  a 
radio  (visual).  The  ultimate  goal  is  for  the  combined  auditory-visual  display  to  be 
experienced  as  a  single  entity,  and  not  as  separate  auditory  and  visual  displays.  The 
following  describes  the  development  process  of  the  visual,  auditory,  and  combined 
auditory-visual  displays  used  in  the  pilot  study.  This  development  process  was 
instrumental  in  the  eventual  experimental  design  of  the  three  main  experiments. 

B.  VISUAL-DISPLAY  DEVELOPMENT 

To  obtain  the  visual  image  of  a  radio,  various  techniques  were  utilized.  First,  a 
digital  camera  was  used  to  take  pictures  of  a  radio  in  various  settings  (i.e.  indoors  and 
outdoors).  However,  the  lighting  and  shadowing  of  these  digital  photos  proved  too 
difficult  to  manage  properly.  To  eliminate  lighting  and  shadowing  problems,  the  next 
method  involved  using  a  flatbed  scanner.  The  radio  was  simply  placed  on  the  scanner, 
while  the  scanner  recorded  the  image  of  the  radio.  This  method  actually  produced  fairly 
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good  images,  but  there  were  still  minor  lighting  and  shadowing  problems.  Ultimately,  a 
photograph  of  a  radio  was  taken  from  the  book  Radios  by  hallicrafters  with  Price  Guide 
by  Chuck  Dachis  [DACH95].  This  book  contains  many  professionally  photographed 
radios.  After  deliberating  over  the  many  pictures,  a  particular  radio  was  finally  chosen. 
This  radio  image  was  then  digitized  using  a  flatbed  scanner  at  600  x  600  pixel  resolution. 
The  color  version  of  this  radio  is  depicted  earlier  in  Chapter  IV,  Figure  32.  Since  the 
visual  displays  of  this  experiment  only  involve  the  manipulation  of  pixel  resolution,  the 
overall  color  content  (impression)  of  the  image  does  not  change  much  when  changing 
pixel  resolution.  As  a  result,  for  the  remaining  discussion  of  this  radio,  all  figures  will  be 
presented  in  black  and  white.  However,  it  is  important  to  emphasize  that  during  the 
experiment,  the  visual  displays  of  the  radio  were  all  presented  in  color.  The  black  and 
white  version  of  this  radio  at  600  x  600  pixel  resolution  is  presented. in  Figure  37.  This 


Figure  37.  Visual  Display  of  Radio  at  600  pixels/inch. 


particular  radio  was  chosen  because  it  contained  many  various  features  including:  letters 
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and  numbers,  smooth  and  rough  surfaces,  strait  and  curved  lines,  patterns  (on  the 
speaker),  and  reflections.  The  basis  for  having  numerous  features  is  to  provide  test 
subjects  with  a  wide  variety  of  cues  from  which  to  make  their  quality  ratings. 
Incidentally,  in  an  effort  to  avoid  any  potential  copyright  infringements,  Chuck  Dachis, 
the  author  of  the  book  was  contacted  by  telephone  for  the  purpose  of  obtaining 
permission  to  use  the  photograph  of  the  radio.  Chuck  Dachis  gave  his  permission  to  use 
any  photograph  necessary  for  the  experiments,  and  was  very  pleased  that  his 
photographic  efforts  were  being  used  in  scientific  research. 

Using  the  original  scanned  image  at  600  pixels/inch,  Adobe  Photoshop 
[ADOB98]  was  then  used  to  make  various  copies  with  degraded  pixel  resolutions  all 
having  the  same  dimensions,  the  size  of  which  nearly  fills  up  the  display  area  of  a  20- 
inch  computer  monitor.  Approximately  30  images  of  the  radio  ranging  from  200  to  600 
pixels/inch  were  produced.  The  next  step  involved  establishing  levels  of  pixel  resolution 
that  were  noticeably  different,  but  not  just-noticeably-different  or  obviously  different. 
The  goal  was  to  establish  low-,  medium-,  and  high-quality  visual  displays  for  use  in  the 
experiment.  An  example  that  is  obviously  different  is  asking  a  subject  to  compare  the 
quality  between  Figure  37  with  Figure  38.  As  one  can  see,  the  difference  is  obvious, 
resulting  in  an  inconsequential  response  from  the  subject.  An  example  that  is  perhaps 
just-noticeably-different,  is  asking  a  subject  to  compare  the  quality  between  Figure  37 
and  Figure  39.  In  this  case,  it  is  fairly  difficult  to  distinguish  the  quality  difference 
between  the  two  radios.  The  basic  idea  is  to  create  changes  in  pixel  resolution  that  the 
subject  can  distinguish,  but  only  with  some  effort.  This  process  of  establishing  the 
noticeable  levels  of  pixel  resolution  was  very  time  consuming.  Preliminary  subjects  were 
presented  (using  the  same  graphics  accelerator  and  computer  monitor  chosen  for  the 
experiment  as  described  later)  about  six  or  seven  images  of  the  radio  with  varying  levels 
of  pixel  resolution.  A  subject  would  then  be  asked  to  arrange  (if  possible)  the  images  in 
ascending  or  descending  order  of  quality.  After  repeating  this  process  with  about  15 
subjects,  a  consensus  was  finally  reached  which  ultimately  determined  the  low-,  medium- 
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Figure  38.  Obviously  Different  Poor-Quality  Visual  Display  of  Radio. 
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Figure  39.  Just-Noticeably-Different  High-Quality  Visual  Display  of  Radio. 


,  and  high-quality  visual  displays  of  the  radio  to  be  used  in  the  experiment.  Resolutions  of 
425  pixels/inch,  450  pixels/inch,  and  500  pixels/inch  were  selected  as  the  low-,  medium-, 
and  high-quality  visual  displays  respectively  to  be  used  in  the  pilot  study.  In  general, 
however,  the  actual  (absolute)  pixel  resolution  is  not  important,  for  there  are  numerous 
factors  which  affect  the  final  rendering  of  the  visual  display  such  as:  1)  computer  monitor 
specifications,  2)  computer  monitor  desk  size  (resolution),  3)  video/graphics  accelerator 
specifications,  and  4)  software  application  graphics-rendering  capabilities.  An  example  of 
this  last  factor,  in  terms  of  the  pilot  study,  relates  to  the  capability  of  rendering  textured 
images  via  the  CosmoPlayer  VRML  Plugin  [COSM98]  to  Netscape  Communicator 
[NETS98].  Since  the  visual  displays  were  represented  as  textured  images  in 
CosmoPlayer,  the  displays  had  to  be  further  processed  (filtered)  by  CosmoPlayer.  This 
resulted  in  noticeably  degraded  quality  in  the  visual  displays.  This  fact  was  well  known 
ahead  of  time  and  was  incorporated  into  the  initial  development  of  the  low-,  medium-, 
and  high-quality  visual  displays.  As  a  result,  the  only  way  to  actually  visualize  the  correct 
representations  of  the  low-,  medium-,  and  high-quality  displays  selected,  is  to  view  them 
through  CosmoPlayer.  However,  because  the  pilot  study  implementation  was  eventually 
abandoned,  it  is  not  possible  to  adequately  depict  the  visual  displays  as  figures  to  view  in 
this  dissertation.  Nevertheless,  the  important  thing  is  that  a  relative  quality  ordering  of  the 
visual  displays  was  established,  for  the  intent  of  this  research  effort  is  to  focus  on  the 
perceptual  effects  of  various  quality  visual  displays,  and  not  on  the  absolute  levels  of 
pixel  resolution  that  determine  these  various  quality  displays.  It  is  also  important  to  note 
that  even  the  high-quality  visual  display,  has  some,  albeit  slight,  pixel  resolution 
degradation.  The  reason  for  this  is  based  on  the  design  of  the  experiment.  The  goal  is  to 
have  three  noticeably  different  quality  displays  based  on  pixel  resolution,  and  not  to  have 
one  display  with  absolutely  no  perceivable  pixel  resolution  degradation  and  two  displays 
which  do  have  pixel  resolution  degradation.  If  this  were  the  case,  the  unwanted  issue  of 
absence  or  presence  of  noticeable  pixel  resolution  is  introduced.  As  such,  subjects  might 
be  comparing  the  one  display  with  no  perceivable  pixel  resolution  degradation  to  the  two 
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displays  which  do  have  pixel  resolution  degradation.  Thus,  in  order  to  ensure  that 
subjects  are  making  quality  ratings  based  only  on  degree  of  pixel  resolution  (not  absence 
or  presence),  the  high-quality  display  must  also  have  a  small  amount  of  perceivable  pixel 
resolution  degradation. 

C.       AUDITORY-DISPLAY  DEVELOPMENT 

Constructing  the  auditory  displays  was  much  easier  than  constructing  the  visual 
displays,  since  music  can  be  obtained  easily  from  any  compact  disc  (CD).  The  only 
consideration  was  the  musical  content.  Since  the  quality,  parameter  to  be  manipulated  in 
the  pilot  study  is  sampling  frequency,  a  conscious  decision  was  made  not  to  include 
vocals  (speech).  The  reason  for  this  is  because  the  frequency  range  of  speech  is  much  less 
than  that  of  typical  musical  instruments.  For  example,  if  the  sampling  frequency  of  music 
containing  vocals  is  altered,  the  noticeable  effect  will  be  greater  with  the  musical 
instruments  than  with  the  vocals.  As  such,  if  subjects  focused  on  the  vocals  (which  is 
fairly  common),  they  might  not  be  aware  of  any  changes  to  the  musical  instruments. 
Therefore,  choosing  music  without  vocals  eliminates  the  possibility  of  subjects  focusing 
on  the  nonperceivable  speech  qualities.  In  terms  of  the  type  of  music  to  use,  choices 
considered  were  jazz,  pop,  rock,  alternative,  and  classical.  The  consideration  here  is  that 
if  a  subject  is  familiar  with  the  music,  the  subject  might  have  some  preconceived 
expectations  or  might  make  unwanted  comparisons  from  a  previous  listening  experience 
to  the  auditory  display  that  is  to  be  evaluated.  As  such,  to  reduce  the  chance  that  subjects 
might  have  previously  heard  the  music,  an  obscure  portion  of  alternative  music  was 
selected.  Another  consideration  in  choosing  the  music  was  that  the  experimenter  (myself) 
would  have  to  listen  to  this  piece  of  music  for  perhaps  hundreds  of  times.  So,  the 
particular  music  selected  was  also  very  much  liked  by  the  experimenter  (me).  The  music 
was  taken  from  a  song  called  A  Forest  from  the  CD  Mixed  up  by  The  Cure  which  was 
produced  by  Elektra  Entertainment  Group,  a  division  of  Warner  Communications  Inc.  In 
order  to  avoid  any  potential  copyright  infringements,  a  letter  was  written  to  Elektra 
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Records  requesting  to  use  portions  of  A  Forest  for  scientific  research.  Elektra  replied  with 
an  official  letter  granting  permission  to  use  portions  of  A  Forest  as  long  as  a  courtesy 
credit  is  given  (see  Figure  40).  Thus,  in  accordance  with  Elektra' s  stipulation,  portions  of 
A  Forest  by  The  Cure,  courtesy  of  Elektra  Entertainment  Group,  are  used  in  the  conduct 
of  this  experiment.  (Thanks  Elektra.) 

Using  the  Mixed  up  CD  by  The  Cure,  a  20  second  selection  of  The  Forest  was 
recorded  into  Sonic  Foundary's  SoundForge  [SONI98]  at  44.1  kHz  (sampling 
frequency).  The  portion  of  music  selected  contained  cymbals  (among  other  instruments) 
resulting  in  a  very  wide  frequency  range  of  sound.  SoundForge  was  then  used  to 
reproduce  the  44. 1  kHz  20-second  musical  selection  at  numerous  sampling  frequencies 
ranging  from  4  kHz  to  44. 1  kHz.  Similar  to  creating  the  visual  displays,  the  next  step 
involved  establishing  sampling  frequencies  that  were  noticeably  different,  but  not  just- 
noticeably-different  or  obviously  different.  The  goal  was  to  establish  low-,  medium-,  and 
high-quality  auditory  displays  for  use  in  the  experiment.  The  basic  idea  is  to  create 
changes  in  sampling  rate  that  the  subject  could  distinguish,  but  only  with  some  effort. 
This  process  of  establishing  noticeable  sampling  frequencies  was  again  very  time 
consuming.  Preliminary  subjects  were  presented  (using  the  same  audio  card  and 
headphones  chosen  for  the  experiment  as  described  later)  about  six  or  seven  music 
selections  with  varying  sampling  frequencies.  These  subjects  were  then  asked  to  arrange 
(if  possible)  the  musical  selections  in  ascending  or  descending  order  of  quality.  After 
repeating  this  process  with  about  15  preliminary  subjects,  a  consensus  was  finally 
reached  which  ultimately  determined  the  low-,  medium-,  and  high-quality  auditory 
displays  of  music  to  be  used  in  the  experiment.  Sampling  rates  of  1 1  kHz,  17  kHz,  and 
44.1  kHz  were  selected  as  the  low-,  medium-,  and  high-quality  auditory  displays 
respectively  for  use  in  the  pilot  study.  A  consensus  also  established  a  constant  volume 
setting  for  the  auditory  displays.  Again,  it  is  important  to  remember  that  the  actual 
(absolute)  sampling  frequency  is  not  important,  for  there  are  numerous  factors  which 
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February  13  1998 
MA  FAX  408-656-2814 


Russell  Storm 

Major,  US  Army 

Dept  of  Computer  Science 

Navai  Post  Graduate  School 

Monterey,  California   93943 


Gentteperson: 

This  will  confirm  that  EJektra  Entertainment  Group,  a  division  of  Wamer 
Communications  Inc.  ("Elektra")  has  no  objection  to  your  use  of  portions  of  tha  master 
recording  "A  Forest"  (the  'Master")  performed  by  The  Cure  ("Artist")  solely  for  the 
purposes  of  a  scientific  experiment  in  connection  with  your  dissertation  as  described  in 
the  attached  facsimile  dated  January  30,  1998.  You  shall  not  distribute  any  copies  of 
the  Master 


I         You  acknowledge  that  as  between  you  and  Elektra.  EJektra  is  the  exclusive  owner  of  all 
i  rights  in  and  to  the  Master  for  the  United  States  and  Canada,  and  that  you  will  not  use 

5  the  Master  for  any  purpose  other  than  that  described  above.  You  will  be  responsible  for 

<  obtaining  any  other  required  consents  and  making  all  required  payments,  and  you 

indemnify  Elektra  from  any  claims  by  third  parties  in  connection  with  the  foregoing 

■         You  wll  provide  a  courtesy  credit  as  follows:  "A  Forest*  by  The  Cure  courtesy  of 

*  "Elektra  Entertainment  Group*. 

Please  confirm  you  acceptance  of  the  foregoing  by  signing  in  the  space  below  and 

*  returning  this  letter  back  to  us.  Your  use  of  the  Master  shall  constitute  such  acceptance 


affect  the  final  rendering  of  any  auditory  display  such  as:  1 )  how  the  original  sound  was 
produced,  2)  audio  card  specifications,  3)  rendering  type  (i.e.,  headphones  or  speakers), 
and  4)  rendering  type  specifications.  Nevertheless,  as  with  the  visual  displays,  the 
important  thing  is  that  a  relative  quality  ordering  of  the  auditory  displays  was  established, 
for  the  intent  of  this  research  effort  is  to  focus  on  the  perceptual  effects  of  various  quality 
auditory  displays,  and  not  on  the  absolute  sampling  frequencies  that  determine  these 
various  quality  displays.  It  is  interesting  to  note  that  the  high-quality  auditory  display, 
unlike  the  high-quality  visual  display,  did  not  need  to  be  slightly  degraded  in  order  to 
avoid  the  absence  or  presence  degradation  issue  which  was  a  concern  with  the  visual 
displays.  The  reason  for  this  is  that  our  eyes  are  accustomed  to  a  certain  fidelity  (quality), 
but  our  ears  are  not  as  discerning.  This  was  readily  apparent  during  the  process  of 
selecting  the  three  auditory  display  qualities.  When  evaluating  the  various  selections,  not 
one  subject  could  not  distinguish  between  44. 1  kHz  or  22.05  kHz,  which  could  be 
attributed  to  the  various  factors  involved  in  the  final  rendering  of  the  auditory  display,  as 
discussed  earlier.  Nevertheless,  in  terms  of  the  higher  qualities,  the  ears  were  not  as 
discerning  when  evaluating  sampling  frequency  as  the  eyes  were  at  evaluating  pixel 
resolution. 

D.       AUDITORY-VISUAL  DISPLAY  DEVELOPMENT 

After  establishing  the  visual  and  auditory  displays,  the  next  step  was  to  develop 
the  combined  auditory-visual  displays.  The  consideration  here  is  1)  determining  how  long 
to  render  the  displays,  and  2)  synchronizing  the  rendering  of  both  auditory  and  visual 
displays.  In  order  to  eliminate  any  potential  confounds,  the  amount  of  time  a  subject  is 
given  to  view  or  hear  the  displays  when  presented  separately  must  be  the  same  amount  of 
time  given  to  view/hear  the  combined  auditory-visual  displays.  During  the  process  of 
establishing  both  the  auditory  and  visual  low-,  medium-,  and  high-quality  displays, 
subjects  were  asked  if  they  needed  more  or  less  time  to  view  or  hear  the  appropriate 
displays.  Based  on  a  consensus,  seven  seconds  was  chosen  for  both  displays. 
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Interestingly,  some  subjects  at  first  thought  they  needed  more  time  (around  20  seconds), 

but  when  given  more  time,  the  subjects  realized  that  they  were  changing  their  minds  too 

often  about  the  quality,  and  when  it  came  time  to  rate  the  quality  of  the  display,  they 

forgot  what  they  were  thinking.  The  subjects  then  requested  a  shorter  time  duration.  In  a 

related  experiment  conducted  to  measure  the  scene-dependent  quality  variations  in 

digitally  coded  television  pictures,  subjects  were  asked  to  assess  distortions  introduced  by 

Motion  Picture  Expert  Group-2  (MPEG)  coding  (see  [MPEG98]).  MPEG-2  sequences  of 

10  and  30  seconds  length  were  used.  One  of  the  findings  of  this  experiment  was  that  the 

30  second  sequences  were  too  long.  This  finding  supports  previous  evidence  of  the  length 

of  human  working  memory  (WM). 

There  is  evidence  to  suggest  that  WM  has  a  duration  of  about  20  s  and  that  the  rate  of 
decay  in  WM  is  dependent  on  the  amount  of  information  presented,  as  it  has  a  limited 
capacity.  Both  of  these  facets  of  memory  can  be  seen  as  important  in  the  results,  in  that 
the  end  of  the  sequences  are  more  accessible  to  memory  recall  (the  recency  effect)  and 
may  bias  the  subjects  overall  rating.  [PETE59]  [WICK92]  [ALDR95] 

Although  the  displays  in  the  pilot  study  and  main  experiments  are  static,  as  opposed  to 

motion  video,  the  same  concept  of  human  WM  applies.  Therefore,  based  on  subject 

consensus  and  human  WM  theory,  all  displays  for  the  pilot  study,  whether  presented 

separately  or  in  combination,  are  presented  to  the  subject  for  seven  seconds.  Having  now 

established  all  required  displays,  the  main  design  of  the  pilot  study  was  ready  to  be 

developed. 

E.       SUMMARY 

In  summary,  this  chapter  has  provided  an  overview  of  the  selection  and 
development  process  of  the  auditory-only,  visual-only, 'and  combined  auditory-visual 
displays  utilized  in  the  experimental  design  of  this  research  effort. 
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VI.  PILOT  STUDY 

A.  INTRODUCTION 

The  pilot  study  played  a  crucial  role  in  this  research  effort.  The  lessons  learned 
from  the  pilot  study  were  essential  to  the  development  and  use  of  appropriate  auditory 
and  visual  displays  and  to  the  overall  design  of  the  three  main  experiments  forming  the 
foundation  of  this  dissertation. 

B.  LOCATION 

All  experiment  sessions  of  the  pilot  study  were  conducted  in  the  same  isolated 
room  under  the  same  ambient  conditions.  The  dimensions  of  the  room  were 
approximately  10  feet  x  20  feet.  Before  each  session,  1)  all  nonessential  electronic 
equipment  was  turned  off,  2)  telephones  were  unplugged,  3)  windows  were  closed  and 
covered  with  blackout  cloth,  4)  the  main  overhead  lights  were  turned  off,  5)  a  60  watt 
incandescent  desk  lamp  was  turned  on  behind  the  computer  monitor  to  eliminate  any 
glare,  6)  the  door  to  the  room  was  closed,  7)  a  Do  Not  Disturb  Sign  was  placed  on  the 
outside  of  the  door,  and  8)  the  subject  was  asked  to  turn  off  any  audible  pagers,  mobile 
phones,  and/or  watches.  This  last  condition  was  only  implemented  by  accident,  after  a 
subject's  beeper  sounded  during  an  experiment  session. 

C.  PARTICIPANTS 

A  total  of  22  volunteer  participants  (6  Female,  16  Male)  comprised  from  the 
students,  faculty,  staff,  and  guests  of  NPS  served  as  subjects  ranging  in  age  from  28  to 
62.  All  subjects  were  required  to  have  20/20  or  corrected  20/20  vision  and  normal 
hearing.  Because  the  experiment  did  not  involve  precise  measurements  of  pixel  resolution 
or  sampling  frequency,  a  vision  and  hearing  test  were  not  needed.  Nevertheless,  before 
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conducting  the  experiment,  each  subject  was  asked,  as  part  of  a  voluntary  consent  form, 
if  he  or  she  met  the  vision  and  hearing  requirements. 

D.  APPARATUS 

A  Pentium  166  MHz  personal  computer  with  64  MBytes  main  memory  running 
Microsoft  Windows  NT  4.0  served  as  the  main  hardware  platform  of  the  pilot  study.  The 
low-,  medium-,  and  high-quality  auditory  displays,  described  earlier,  were  generated  by  a 
Sound  Blaster  16  PnP  audio  card  [CREA98]  and  rendered  via  Sennheiser  HD  540 
reference  //headphones  [SENN98].  The  low-,  medium-,  and  high-quality  visual  displays, 
described  earlier,  were  generated  by  an  Elsa  Gloria-8  graphics  accelerator  card 
[ELSA98]  and  rendered  via  a  Sony  Multiscan  20  inch  sfll  computer  monitor  [SONY98a] 
set  at  800  x  600  resolution.  The  entire  automated  experiment  was  contained  within  a 
Netscape  Communicator  4.05  HTML  browser  window  [NETS98]  using  CosmoPlayer  2.0 
VRML  plug-in  [COSM98]  to  render  the  visual-only,  auditory-only,  and  combined 
auditory-visual  displays,  and  using  Java  pop-up  windows  developed  using  JDK  1.1.5 
(Java  Development  Kit)  [SUNM98]  to  collect  subject  responses. 

E.  PROCEDURE 

The  experiment  involved  a  3x3  factorial  within  subjects  design.  The  two 
•independent  variables  were  visual  and  audio  display  quality.  The  two  dependent  variables 
were  the  corresponding  quality  perception  of  the  auditory  and  visual  displays.  The  three 
levels  of  the  visual  quality  independent  variable  consisted  of  low-,  medium-,  and  high- 
quality  visual  displays  of  the  radio  image  depicted  earlier  in  Chapter  IV,  Figure  32 
having  resolutions  of  425  pixels/inch,  450  pixels/inch,  and  500  pixels/inch  respectively. 
The  three  levels  of  the  auditory  quality  independent  variable  consisted  of  low-,  medium-, 
and  high-quality  auditory  displays  of  the  same  music  selection  having  sampling  rates  of 
1 1  kHz,  17  kHz,  and  44.1  kHz  respectively.  As  such,  the  visual  display  parameters 
manipulated  were  pixel  resolution,  and  the  auditory  display  parameters  manipulated  were 
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sampling  frequency.  During  each  experiment,  which  lasts  approximately  30  minutes, 
each  subject  wears  headphones  and  sits  in  front  of  a  20-inch  computer  display  monitor. 
The  task  of  the  subject  was  to  rate  the  perceived  quality  of  audio-only,  visual-only,  and 
audio-visual  displays  via  rating  scales  as  either  low-,  medium-,  or  high-quality. 

After  reading  a  brief  experimental  overview  and  signing  a  voluntary  consent 
form,  the  subject  was  seated  in  a  chair  facing  the  computer  monitor.  The  subject  was 
instructed  to  adjust  the  seat  height  and/or  monitor  orientation  to  that  which  was  most 
comfortable  and  which  represented  their  typical  computer  monitor  viewing  habit. 
Although  a  standard  viewing  position/orientation  is  much  desired  in  experimental  design, 
the  focus  of  this  experiment  was  not  on  precision,  but  rather  perception.  Accordingly,  the 
idea  was  for  subjects  to  be  1)  relaxed,  2)  comfortable,  3)  and  in  their  typical  viewing 
position/orientation.  Nevertheless,  no  subject  sat  closer  that  about  one  foot  or  further  than 
about  three  feet  from  the  monitor.  The  subjects  were  instructed  on  how  to  wear  and  fit  the 
headphones,  and  also  how  to  adjust  the  volume  if  necessary.  In  order  to  maintain 
identical  testing  conditions,  it  was  hoped  that  no  one  would  need  to  adjust  the  previously 
established  headset  volume.  If  a  subject  did  adjust  the  headset  volume,  that  subject's  data 
would  not  be  included  in  the  final  data  analysis.  However,  no  subject  needed  to  adjust  the 
headset  volume. 

Once  the  subject  was  seated  and  wearing  the  headphones,  an  automated  computer 
program  contained  within  an  HTML  browser  window  instructed  the  subject  to  enter  some 
personal  data  information  as  depicted  in  Figure  41 .  This  personal  data  was  used  to  create 
a  unique  data  file  to  collect  the  specific  subject's  data  for  the  remainder  of  the 
experiment.  The  file  created  is  a  .csv  (comma  separated  variable)  file  which  can  easily  be 
imported  into  Microsoft  Excel.  This  was  the  only  time  for  which  the  keyboard  was 
utilized.  For  the  remainder  of  the  experiment,  only  the  mouse  was  needed.  The  automated 
experiment  continues  by  presenting  the  subject  with  a  series  of  instructions  giving  full 
explanation  of  what  is  and  is  not  required  of  the  subject.  The  visual-only,  auditory-only, 
and  combined  auditory-visual  displays  were  rendered  via  VRML,  and  Java  pop-up 
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SrAn  Experiment  -  Netscape 


0e    £dit    View    go    Communicator    Help 

/>         ■.     „/ 


EH 


Input  Data 

Before  starting  the  experiment,  please  enter  the  Mowing  information  about  yourself: 


Last  Name 

Sex  (type  M  or  F) 

First  Name 

Middle  Initial: 

r~ 

Age                         Occupation 

Subject  and  Sequence  Number:  (i  e  11.21,  etc ) 
Press  to  Enter  Your  Data  | 

You  must  press  to  enter  your  data  before  continuing. 


Click  here  to  continue  with  th< 
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Figure  41.  Pilot  Study:  Initial  Data  Input  Screen. 

windows  collected  subject  responses.  The  primary  reason  for  using  VRML  is  for  the 
eventual  goal  of  manipulating  auditory  and  visual  displays  in  3D  scenes.  Even  though 
only  static  visual  displays  are  currently  used,  the  idea  was  to  develop  the  foundation  of 
the  experiment  using  VRML  to  facilitate  an  easy  transition  to  full  3D  scenes.  Other 
considerations  for  using  VRML  are  as  follows  1)  it  is  freely  downloadable,  2)  it  is  easy  to 
use,  3)  it  has  a  very  short  learning  curve,  and  4)  it  is  new  technology  worth  investigating. 

As  the  automated  experiment  continues,  the  first  set  of  instructions  presented  to 
the  subject  is  depicted  in  Figure  42.  The  idea  is  for  the  subject  to  memorize  the  quality 
differences  among  the  three  displays.  The  same  process  was  repeated  again  to  give  the 
subject  yet  another  chance  to  review  and  memorize  the  three  quality  levels.  Next,  the 
subject  is  instructed  how  to  rate  the  visual-only  displays  as  depicted  in  Figure  43.  After 
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(1)  You  will  now  see  a  sequence  of  three  different  visual  displays. 

First,  a  LOW  quality' visual  display  will  be  shown  for  7  seconds. 
Second,  a  MEDIUM  quality  visual  display  will  be  shown  for  7  seconds. 
Third,  a  HIGH  quality  visual  display  will  be  shown  for  7  seconds. 

(2)  No  response  is  required  from  you  at  this  time. 

(3)  Later  in  this  experiment,  you  will  be  tested  on  your  ability  to  correctly 

identify  which  visual  display  is  LOW,  MED,  or  HIGH  quality 

Therefore,  at  this  time  you  should  try  your  best  to  memorize 

any  differences  among  the  LOW,  MED  and  HIGH  quality  visual  displays. 


Press  to  Continue 


\'^>  j  Signed  by:  Unsigned  classes  from  local  hard  disk 


Figure  42.  Pilot  Study:  Visual-Only  Familiarization  Instructions. 
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( 1 )  You  will  now  be  rating  the  quality  of  the  visual  displays  which 

you  have 

just 

seen. 

(2)  A  total  of  nine  visual  displays  will  be  presented  randomly. 

(3)  You  will  have  7  seconds  to  see  each  visual  display. 

(4)  After  seeing  the  visual  display,  you  will  be  prompted  for  your 

rating. 

Press  to  Continue  j 

|"^>  J  Signed  by:  Unsigned  classes  from  local  hard  disk 

Figure  43.  Pilot  Study:  Visual-Only  Rating  Instructions. 

the  seven  seconds  for  which  each  visual  display  is  rendered,  the  visual  display 
automatically  disappears,  and  a  Java  pop-up  window  automatically  appears  to  facilitate 
the  visual  display  rating  as  depicted  in  Figure  44.  The  subject  rates  a  total  of  nine  visual- 
only  displays  (three  of  each  quality,  low,  medium,  and  high,  presented  in  random  order). 
After  rating  the  visual-only  displays,  the  subject  uses  the  exact  same  process  to  rate  nine 
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Visual  Display Qualtiy  Rating— >       C  [Low    C  Med     C  High 


Press  to  Continue 


Vv>  [Signed  by:  Unsigned  classes  from  local  hard  disk 


Figure  44.  Pilot  Study:  Visual  Display  Rating  Scale. 

auditory-only  displays  (three  of  each  quality  presented  in  random  order)  by  using  the 
auditory  rating  scales  as  depicted  in  Figure  45.  After  rating  the  auditory  displays,  the 


N  Auditory  Display  Quality  Rating  Scale 
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Auditory  Display  Quality  Rating— >      C  Low     C  Med     <~  High 

Press  to  Continue 


p5  | Signed  by:  Unsigned  classes  from  local  hard  disk 


Figure  45.  Pilot  Study:  Auditory  Display  Rating  Scale. 

subject  is  presented  with  instructions  on  how  to  rate  the  combined  auditory-visual 
displays  as  depicted  in  Figure  46.  After  each  of  the  18  combined  auditory-visual  displays 
is  presented  (the  nine  permutations  of  the  auditory  and  visual  qualities  are  partially 
counterbalanced  through  the  Latin  squares  technique,  and  then  presented  in  reverse  order 
for  a  total  of  18  combined  auditory-visual  ratings),  the  subject  rates  both  the  auditory  and 
visual  displays  using  the  combined  auditory-visual  rating  scale  depicted  in  Figure  47. 
After  the  subject  has  completed  rating  all  of  the  displays,  the  automated  portion  of  the 
experiment  terminates.  The  subject  is  then  asked  to  complete  a  brief  post-experiment 
survey  consisting  of  1 3  questions  as  depicted  in  Figure  48  and  Figure  49.  After 
completing  the  post-experiment  questions,  the  subject  is  allowed  to  ask  any  overall 
questions  about  the  experiment.  The  experiment  is  then  terminated,  and  the  subject  is  free 
to  go. 
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( 1 )  You  will  now  be  presented  a  sequence  of  1 8  various  combined  visual  and  auditory  displays . 

(2)  These  displays  consist  of  the  same  visual  and  auditory  displays  which  you  have  just 

rated  with  the  same  LOW,  MEDIUM,  and  HIGH  qualities.  However,  the  visual  and 
auditory  displays  will  now  be  presented  simultaneously.  As  a  result,  you  might  be 
presented  a  high  quality  visual  display  along  with  a  low  quality  auditory  display, 
and  vice  versa.  Or  you  might  be  presented  a  high  quality  visual  display  along  with 
a  high  quality  auditory  display  etc,  etc,  ... 

(3)  Each  combined  visual  and  auditory  display  will  be  presented  randomly  for  7  seconds. 

(4)  After  each  combined  visual  and  auditory  display,  you  will  be  tested  on  your  ability  to 

correctly  identify  whether  the  visual  display  is  LOW,  MED,  or  HIGH  quality, 
and  whether  the  auditory  display  is  LOW,  MED,  or  HIGH  quality. 


Press  to  Continue 


Tv>  I  Signed  by:  Unsigned  classes  from  local  hard  disk 


Figure  46.  Pilot  Study:  Combined  Auditory-Visual  Rating  Instructions. 


1  N  Visual  and  Auditory  Display  Quality  Rating  Scales 
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Visual  Display Qualtiy  Rating—  >      C  Low     C  Med 

C  High 

<—  Visual 

Auditory  Display  Quality  Rating  — >      C  Low     C  Med 

C  High 

<—  Auditory 

Press  to  Continue  j 

j"^>  j  Signed  by:  Unsigned  classes  from  local  hard  disk 

Figure  47.  Pilot  Study:  Combined  Auditory-Visual  Rating  Scale. 

F.        RESULTS  AND  DISCUSSION 

The  results  of  the  pilot  study  proved  invaluable  and  led  to  a  completely 
redesigned  experiment.  Software  and  hardware  problems,  procedural  problems,  as  well  as 
validating  some  experimental  design  criteria  were  identified  and  are  discussed  below. 
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Post  Experiment  Questions 

For  the  following  questions,  circle  the  whole  number  that  best  represents  your  response. 
Circling  number  4  means  you  are  indifferent  about  the  question.  Use  only  whole  numbers  1 
through  7.  Do  not  use  tractions. 

1.  How  easy  or  difficult  was  it  to  determine  the  quality  of  the  visual  only  displays'? 

very  easy-         1  2  3  4  5  6  7  -very  hard 

2.  How  easy  or  difficult  was  it  to  determine  the  quality  of  the  auditory  only  displays? 

very  easy-        12  3  4  5  6  7  -very  hard 

3.  How  easy  or  difficult  was  it  to  determine  the  quality  of  the  auditory -visual  displays? 

very  easy-        1  2  3  4  5  6  7         -very  hard 

4.  Would  you  have  liked  less  or  more  time  to  view  the  visual  only  displays? 

less  lime-  I  2  3  4  5  6  7  -more  time 

5.  Would  you  have  liked  less  or  more  lime  to  hear  the  auditory  only  displays? 

less  lime-  1  2  3  4  5  6  7  -more  time 

6.  Would  you  have  liked  less  or  more  time  to  hear-see  the  auditory-visual  displays? 

less  time-         1  2  3  4  5  6  7  -more  time 

7.  Time  wise,  was  the  overall  experiment  too  short  or  too  long? 

loo  short-         1  2  3  4  5  6  7  -too  long 

8.  Was  the  experiment  mentally  exhausting  or  not? 

not  very-  12  3  4  5  6  7  -yes  very 


Auditory-Visual  Cross-Modal  Experiment  (Phase  I)  c  Last  Name: 


Subject  and  Sequence  Number:. 
Diile: 


Figure  48.  Pilot  Study:  Post-Experiment  Questions  1-8. 
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For  the  following  questions,  circle  yes  or  no  and/or  make  appropriate  comments  if  applicable. 

9.    Did  you  direct  your  attention  to  any  specific  features  of  the  visual  display  when  determining 
the  quality  of  the  visual  display?      No     Yes 
If  applicable  please  explain: 


10.  Did  you  direct  your  attention  to  any  specific  features  of  the  auditory  display  when 
determining  the  quality  of  the  auditory  display?     No     Yes 
If  applicable  please  explain: 


1 1 .  Were  you  ever  mentally  overloaded  during  any  part  of  the  experiment?      No      Yes 
If  applicable  please  explain: 


12.  Have  you  participated  in  an  experiment  similar  to  this  one?     No      Yes 
If  applicable  please  explain: 


13.  Any  other  comments  about  what  you  liked  or  didn't  like,  or  things  that  should  be  changed 
during  the  course  of  this  experiment? 


Auditory- Visual  Cross-Modal  Expcnmcnt  (Phase  1)  ^  Last  Name:, 


Subject  and  Sequence  Numher_ 

Date:  ...     '._  


Figure  49.  Pilot  Study:  Post-Experiment  Questions  9  - 13. 
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1.  Software  and  Hardware  Problems 

Perhaps  the  biggest  problem  of  the  pilot  study  was  that  the  software  and  hardware 
utilized  proved  to  be  unstable.  A  computer  hardware  problem,  which  was  never  isolated, 
caused  four  complete  system  crashes,  resulting  in  the  need  to  completely  reload  Windows 
NT  and  all  experiment  software  applications.  This  hardware  problem  caused  the  loss  of 
valuable  time  of  the  subject  as  well  as  the  experimenter  not  to  mention  the  loss  of  the 
irreplaceable  collected  data.  Furthermore,  the  Windows  NT  operating  system  crashed  on 
numerous  occasions  during  pilot  study  development  and  also  during  experiment  sessions, 
again  causing  a  considerable  loss  of  valuable  time  and  data.  The  use  of  VRML  also 
caused  unpredictable  system  crashes.  This  problem  seemed  to  occur  during  Java- VRML 
intercommunication,  and  was  evident  by  receiving  the  Microsoft  Visual  C++  Runtime 
Library  error  number  R6025:  Pure  Virtual  Function  Call.  Having  tried  numerous 
possible  fixes,  this  unpredictable  error  remained.  Another  problem  associated  with 
VRML  was  synchronizing  the  combined  auditory-visual  displays.  The  reason  for  this  is 
because  the  synchronization  was  based  on  the  specifications  of  the  particular  audio  and 
video  hardware  utilized.  As  a  result,  the  synchronization  of  the  displays  could  only  be 
done  through  trial  and  error  which  was  very  time  consuming.  Furthermore,  this  limits  the 
portability  aspect  of  the  experiment  which  is  turn  severely  precludes  the  possibility  of 
conducting  future  on-line  experiments.  Ultimately,  because  of  the  unreliable  nature  of  the 
software  and  hardware,  the  pilot  study  was  terminated  before  collecting  the  required 
number  of  data  points  to  warrant  proper  data  analysis.' However,  the  results  of  the  13 
subjects  who  successfully  completed  the  experiment  without  any  system  crashes  suggest 
that  further  examination  of  auditory-visual  cross-modal  perception  phenomena  is 
warranted.  These  results  are  discussed  later. 

2.  Procedural  Problems 

Identifying  experimental  design  procedural  errors  was  another  very  important 
contribution  of  this  pilot  study.  The  main  procedural  errors  identified  were:  visibility  of 
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Netscape's  status  window,  rating  scales  default  setting,  time  delay  between  ratings, 
narrow  range  of  rating  scales,  and  memorization  versus  perception  measurement. 

a.  Netscape  Status  Window 

After  asking  one  of  the  test  subjects  about  the  difficulty  of  the  experiment, 
the  subject  said  that  it  was  not  too  hard  to  rate  the  quality  of  the  displays,  for  he  was 
simply  looking  at  Netscape's  status  window  while  the  displays  were  being  loaded.  He 
figured  correctly,  that  the  larger  the  file  size,  the  better  the  quality.  Thus,  he  simply 
looked  at  the  status  window,  as  opposed  to  the  displays,  resulting  in  very  accurate 
responses.  The  immediate  correction  to  this  problem  was  to  cover  the  status  bar  with  a 
piece  of  black  cloth.  Ultimately  it  was  discovered  that  the  key  sequence  ctrl-alt-s  toggles 
the  appearance  of  Netscape's  status  window. 

b.  Rating  Scales  Default  Setting 

Unbeknownst  to  the  subject,  the  subject's  response  time  to  rate  the  various 
displays  was  being  measured.  Upon  analyzing  the  response  time  data,  the  response  time 
to  rate  the  medium-quality  for  the  auditory-only,  visual-only,  and  combined  auditory- 
visual  displays  were  significantly  lower  than  that  of  the  high-  or  low-quality  displays.  In 
analyzing  why  this  might  be,  it  became  apparent  that  the  reason  was  because  the 
medium-quality  choice  was  the  default  radio  button  setting  on  all  the  rating  scales  as 
depicted  in  Figure  50.  As  a  result,  if  the  subject  were  to  make  a  medium-quality  choice, 
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Visual  Display  Qualtiy  Rating— >      C  Low     (•  [Med]    C  High 


Press  to  Continue 


p5  j  Signed  by:  Unsigned  classes  from  local  haid  disk 


Figure  50.  Pilot  Study:  Default  Visual  Quality  Rating  Scale. 


the  subject  need  only  click  the  Press  to  Continue  button  on  the  rating  scale.  For  the  low- 
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and  high-quality  choices,  the  subject  had  to  select  the  appropriate  radio  button  and  then 
click  the  Press  to  Continue  button  on  the  rating  scale  which  takes  longer  time.  This 
problem  was  corrected  by  removing  the  medium-quality  default  choice  as  depicted  earlier 
in  Figure  44. 

c.  Time  Delay  Between  Ratings 

Because  of  how  VRML  was  implemented  in  the  experimental  design, 
there  was  a  noticeable  time  delay  associated  with  the  loading  and  unloading  of  the 
VRML  Plug-in  to  Netscape.  Many  subjects  complained  that  this  time  delay  caused  them 
to  lose  perspective  on  the  relative  quality  ordering  of  the  displays.  Subjects  wanted  a 
faster  turn-around  time  between  quality  ratings.  A  possible  correction  to  this  problem  is 
to  redesign  VRML's  use  so  that  its  plug-in  is  only  loaded  once  at  experiment  start-up. 
However,  compounded  with  the  previous  problems  associated  with  VRML,  the  main 
experiments  were  redesigned  without  3D  VRML,  resulting  in  2D  HTML  displays. 

d.  Narrow  Range  of  Rating  Scales 

Because  of  the  experimental  design,  the  range  of  the  rating  scales  is  small 
having  only  three  possible  values:  low,  med,  high.  This  small  range  introduces  unwanted 
floor  and  ceiling  effects.  For  example,  if  a  high-quality  rating  is  not  selected,  for 
whatever  reason,  the  only  possible  choices  remaining  are  medium-  and  low-quality. 
Likewise,  if  a  low-quality  rating  is  not  selected,  for  whatever  reason,  the  only  possible 
choices  remaining  are  medium-  and  high-quality.  As  a  result,  this  three-choice  rating 
scale  introduces  unwanted  floor  and  ceiling  effects  which  in  turn  reduces  the  ability  to 
properly  measure  any  degrees  of  perceptual  effects  caused  by  the  various  quality 
displays.  In  terms  of  the  goal  of  this  research  effort,  using  a  three-choice  rating  scale 
severely  hampers  supporting  data  analyses.  The  correction  to  this  problem  is  addressed 
later. 
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e.    Memorization  Versus  Perception  Measurement 

The  biggest  procedural  error  was  in  the  overall  experimental  design.  This 
error  stems  from  the  basis  by  which  subjects  make  their  quality  ratings.  The  question  is 
one  of  measurement.  Given  that  the  task  of  a  subject  was  to  memorize  the  three  auditory 
and  visual  display  qualities,  subjects  responses  were  more  likely  based  on  their  ability  to 
memorize  the  given  quality  differences  as  opposed  to  perceiving  potential  changes  in 
display  qualities.  Thus,  the  experiment  becomes  more  of  a  matching  problem  as  opposed 
to  measuring  perceptual  phenomena.  Because  of  this  potential  error,  the  experiment  was 
completely  redesigned  as  described  in  the  next  chapter. 

3.    Validated  Design  Criteria 

Several  positive  outcomes  resulted  from  the  pilot  study.  In  analyzing  the  post- 
experiment  surveys,  a  seven-second  duration  of  visual-only,  auditory-only,  and  combined 
auditory-visual  displays  proved  desirable  and  adequate.  The  subjects'  approval  also 
validated  the  overall  length  of  the  experiment,  which  typically  lasted  around  30  minutes. 
Furthermore,  the  responses  of  the  subjects  also  suggested  that  with  some  effort,  all  the 
displays  were  noticeably  different.  This  finding  was  very  important  for  it  validated  the 
subjective  relative  quality  ordering  of  the  displays,  which  in  turn  validated  the  technique 
used  to  develop  the  various  quality  levels  of  the  displays. 

G.       SUMMARY  AND  CONCLUSIONS 

Because  of  the  many  experimental  procedure  errors  identified  during  the  pilot 
study,  a  valid  data  analysis  of  the  results  is  not  possible  nor  desired.  Nevertheless,  a  few 
points  are  worth  mentioning.  In  terms  of  memorization  (the  matching  problem),  the 
subjects  were  better  able  to  correctly  identify  the  quality  levels  of  the  visual-only  and 
auditory-only  displays,  as  opposed  to  correctly  identifying  the  quality  levels  of  the  visual 
and  auditory  displays  when  presented  in  combination.  Some  subjects  were  better  than 
others  at  identifying  correct  quality  levels.  In  post-hoc  analyses,  there  also  appeared  to  be 
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gender  differences  in  identifying  correct  quality  levels  as  well  as  differences  in  response 
times.  Overall,  the  results  of  the  pilot  study  indicate  that  there  are  differences  in  the 
subjects'  ability  to  correctly  match  auditory-only,  visual-only,  and  combined  auditory- 
visual  displays,  and  that  gender  may  play  a  factor  in  correctly  identifying  the  various 
displays.  In  the  final  analysis,  the  results  of  the  pilot  study  greatly  facilitated  a  new  and 
improved  experimental  design  ultimately  supporting  the  goal  of  this  research  effort  to 
investigate  auditory-visual  cross-modal  perception  phenomena. 
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VII.  EXPERIMENT  1:  STATIC  RESOLUTION 

A.  INTRODUCTION 

Experiment  1 :  Static  Resolution  investigates  the  perceptual  effects  from 
manipulating  visual  display  pixel  resolution  and  auditory  display  sampling  frequency. 
The  visual  display  consists  of  a  static  image  of  a  radio  depicted  earlier  in  Chapter  IV, 
Figure  32,  and  the  auditory  display  is  a  selection  of  music.  Specifically,  the  goal  of  this 
experiment  is  to  answer  the  following  questions: 

1 )  Does  a  high-quality  auditory  display  coupled  with  a  low-quality  visual  display 
cause  a  decrease/increase  in  the  perception  of  audio  quality  and/or  an  increase/decrease  in 
the  perception  of  visual  quality  relative  to  established  baseline  conditions  derived  from 
auditory-only  and  visual-only  quality  perception  evaluations? 

2)  Does  a  low-quality  auditory  display  coupled  with  a  high-quality  visual  display 
cause  an  increase/decrease  in  the  perception  of  audio  quality  and/or  a  decrease/increase  in 
the  perception  of  visual  quality  relative  to  established  baseline  conditions  derived  from 
auditory-only  and  visual-only  quality  perception  evaluations? 

3)  Does  a  low-quality  auditory  display  coupled  with  a  low-quality  visual  display 
cause  a  decrease/increase  in  the  perception  of  audio  quality  and/or  a  decrease/increase  in 
the  perception  of  visual  quality  relative  to  established  baseline  conditions  derived  from 
auditory-only  and  visual-only  quality  perception  evaluations? 

4)  Does  a  high-quality  auditory  display  coupled  with  a  high-quality  visual  display 
cause  an  increase/decrease  in  the  perception  of  audio  quality  and/or  an  increase/decrease 
in  the  perception  of  visual  quality  relative  to  established  baseline  conditions  derived  from 
auditory-only  and  visual-only  quality  perception  evaluations? 

B.  LOCATION 

All  sessions  of  Experiment  1 :  Static  Resolution  were  conducted  in  the  same 
isolated  room  under  the  same  ambient  conditions.  The  dimensions  of  the  room  were 
approximately  10  feet  x  20  feet.  Before  each  session,  1)  all  nonessential  electronic 
equipment  was  turned  off,  2)  telephones  were  unplugged,  3)  windows  were  closed  and 
covered  with  blackout  cloth,  4)  the  main  overhead  lights  were  turned  off,  5)  a  60  watt 
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incandescent  desk  lamp  was  turned  on  behind  the  computer  monitor  to  eliminate  any 
glare,  6)  the  door  to  the  room  was  closed,  7)  a  Do  Not  Disturb  Sign  was  placed  on  the 
outside  of  the  door,  and  8)  the  subject  was  asked  to  turn  off  any  audible  pagers,  mobile 
phones,  and/or  watches. 

C.  PARTICIPANTS 

A  total  of  36  volunteer  participants  ( 1 8  Female,  1 8  Male)  comprised  from  the 
students,  faculty,  staff,  and  guests  of  NPS  served  as  subjects.  Based  on  the  preliminary 
findings  of  the  pilot  study,  the  number  of  male  and  female  subjects  in  this  experiment  is 
balanced.  The  average  age  of  the  subjects  is  36.5  years  ranging  in  age  from  15  to  63  (two 
female  subjects  did  not  give  their  age).  All  subjects  were  required  to  have  20/20  or 
corrected  20/20  vision  and  normal  hearing.  Because  the  experiment  did  not  involve 
precise  measurements  of  pixel  resolution  or  sampling  frequency,  a  vision  and  hearing  test 
were  not  needed.  Before  conducting  the  experiment,  each  subject  was  asked,  as  part  of  a 
voluntary  consent  form,  if  he  or  she  met  the  vision  and  hearing  requirements. 

D.  APPARATUS 

A  Pentium  200  MHz  (MMX)  personal  computer  with  64  MBytes  main  memory 
running  Microsoft  Windows  95  served  as  the  main  hardware  platform  of  the  experiment. 
The  auditory  displays  are  generated  by  a  Sound  Blaster  64  AWE  Gold  audio  card 
[CREA98]  and  rendered  via  Sennheiser  HD  540  reference  II  headphones  [SENN98].  The 
visual  displays  are  generated  by  a  Diamond  Multimedia  Viper  V330  128  bit  graphics 
accelerator  card  [DIAM98]  and  rendered  via  a  Sony  Multiscan  20-inch  sf.II  computer 
monitor  [SONY98a]  set  at  800  x  600  resolution.  The  entire  automated  experiment  is 
contained  within  a  Netscape  Communicator  4.05  HTML  browser  window  [NETS98] 
using  JavaScript  to  render  the  visual-only,  auditory-only,  and  combined  auditory-visual 
displays.  Java  pop-up  windows,  developed  using  JDK  1.1.5  [SUNM98],  were  used  to 
collect  subject  responses. 
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E.       PROCEDURE 

The  experiment  involved  a  3x3  factorial  within  subjects  design.  The  two 
independent  variables  are  visual  and  audio  display  quality.  The  two  dependent  variables 
are  the  corresponding  quality  perception  of  the  auditory  and  visual  displays.  The  three 
levels  of  the  visual  quality  independent  variable  consist  of  low-,  medium-,  and  high- 
quality  visual  displays  of  the  radio  image  depicted  earlier  in  Chapter  IV,  Figure  32 
having  resolutions  of  350  pixels/inch,  450  pixels/inch,  and  550  pixels/inch,  respectively. 
The  three  levels  of  the  auditory  quality  independent  variable  consist  of  low-,  medium-, 
and  high-quality  auditory  displays  of  the  same  music  selection  presented  monophonically 
having  sampling  rates  of  1 1  kHz,  23  kHz,  and  35  kHz,  respectively.  As  such,  the  visual 
display  parameters  manipulated  are  pixel  resolution,  and  the  auditory  display  parameters 
manipulated  are  sampling  frequency.  During  the  experiment  which  lasts  approximately 
30  minutes,  each  subject  wears  headphones  and  sits  in  front  of  a  20-inch  computer 
display  monitor.  The  task  of  the  subject  is  to  rate  the  perceived  quality  of  auditory-only, 
visual-only,  and  auditory-visual  displays  via  Likert  rating  scales  ranging  from  1  (low)  to 
7  (high). 

After  reading  a  brief  experimental  overview  and  signing  a  voluntary  consent 
form,  the  subject  is  seated  in  a  chair  facing  the  computer  monitor.  The  subject  is 
instructed  to  adjust  the  seat  height  and/or  monitor  orientation  to  that  which  was  most 
comfortable  and  which  represents  their  typical  computer  monitor  viewing  habit. 
Although  a  standard  viewing  position/orientation  is  much  desired  in  experimental  design, 
the  focus  of  this  experiment  is  not  on  precision,  but  rather  perception.  Accordingly,  the 
idea  was  for  subjects  to  be  1 )  relaxed,  2)  comfortable,  3)  and  in  their  typical  viewing 
position/orientation.  Nevertheless,  no  subject  sat  closer  that  about  one  foot  or  further  than 
about  three  feet  from  the  computer  monitor.  The  subjects  are  instructed  on  how  to  wear 
and  fit  the  headphones,  and  also  how  to  adjust  the  volume  if  necessary.  In  order  to 
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maintain  identical  testing  conditions,  it  was  hoped  that  no  one  would  need  to  adjust  the 
headset  volume.  No  subject  needed  to  adjust  the  headset  volume. 

Once  the  subject  is  seated  and  wearing  the  headphones,  an  automated  computer 
program  contained  within  an  HTML  browser  window  instructs  the  subject  to  enter  some 
personal  data  information  as  depicted  in  Figure  51.  (Note  that  Netscape's  status  window 
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Figure  51.  Experiment  1:  Data  Input  Screen. 

is  not  visible  at  the  bottom  of  the  screen  as  compared  with  that  of  the  pilot  study  depicted 
earlier  in  Chapter  VI,  Figure  41.)  This  personal  data  is  used  to  create  a  unique  data  file  to 
collect  the  specific  subject's  data  for  the  remainder  of  the  experiment.  The  file  created  is 
a  .csv  (comma  separated  variable)  file  which  can  easily  be  imported  into  Microsoft  Excel. 
This  is  the  only  time  for  which  the  keyboard  was  utilized.  For  the  remainder  of  the 
experiment,  only  the  mouse  is  needed.  The  automated  experiment  continues  by 
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You  will  now  be  presented  two  Visual  Displays. 

One  display  is  of  'Low  Quality'  and  the  other  is  of  'High  Quality'. 

To  see  the  'Low  Quality'  display,  click  on  the  'LOW  QUALITY'  link 

To  see  the  'High  Quality'  display,  click  on  the  'HIGH  QUALITY'  link 

You  can  view  either  display  as  long  as  you  like. 

You  can  go  back  and  forth  between  the  displays  as  many  times  as  you  like 

Later  in  this  experiment,  you  will  be  tested  on  your  ability  to  correctly 

identify  various  quality  levels  of  visual  displays.  Therefore,  at  this  time 

you  should  try  your  best  to  memorize  what  is  considered  to  be  a  'Low  Quality'  display, 

and  what  is  considered  to  be  a  'High  Quality'  display.  When  you  are  ready  to 

begin  rating  the  quality  of  visual  displays,  click  on  the  'FINISHED'  link. 


Press  to  Continue 


Figure  52.  Experiment  1:  Visual  Display  Instructions. 

presenting  the  subject  with  a  series  of  instructions  giving  full  explanation  of  what  is  and 
is  not  required  of  the  subject.  The  visual-only,  auditory-only,  and  combined  auditory- 
visual  displays  are  rendered  via  JavaScript,  and  Java  pop-up  windows  collects  subject 
responses. 

As  the  automated  experiment  continues,  the  subject  is  first  presented  with  a  series 
of  instructions,  displays,  and  rating  scales  in  order  to  1 )  ensure  the  headphones  are 
working  properly,  2)  familiarize  the  subject  with  how  the  visual  displays  will  be 
presented  on  the  computer  monitor,  and  3)  familiarize  the  subject  with  what  the  rating 
scales  look  like,  how  they  will  appear  and  disappear  automatically,  and  how  to  use  them. 
After  this  familiarization  process,  the  first  set  of  instructions  presented  to  the  subject  is' 
depicted  in  Figure  52.  The  idea  is  for  the  subject  to  memorize  the  quality  differences 
between  the  lowest  and  highest  quality  visual  displays.  As  a  result,  the  subject  calibrates 
himself  or  herself  to  the  maximum  possible  quality  range  spanned  by  the  low-  and  high- 
quality  extremes.  During  this  process,  the  subject  has  direct  control  in  viewing  the  low- 
and  high-quality  displays  simply  by  clicking  on  either  the  LOW  QUALITY  or  HIGH 
QUALITY hypertext  link.  Figure  53  depicts  the  appearance  of  the  low-quality  visual 
display  having  250  pixels/inch  and  Figure  54  depicts  the  appearance  of  the  high-quality 
visual  display  having  600  pixels/inch.  Note,  that  the  original  displays  were  depicted  in 
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Figure  53.  Experiment  1:  Low-Quality  Visual  Display  Familiarization. 
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Figure  54.  Experiment  1:  High-Quality  Visual  Display  Familiarization. 
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color,  and  that  the  actual  pixel  resolution  experienced  by  the  subject  can  only  be  viewed 
on  the  actual  20  inch  computer  monitor.  However,  the  low-  and  high-quality  displays 
depicted  in  Figure  53  and  Figure  54  are  fairly  good  representations  of  the  quality 
difference  between  the  actual  displays  used  in  the  experiment.  When  the  subject  is  ready 
to  begin  rating  the  visual  displays,  he  or  she  clicks  on  the  FINISHED  hypertext  link.  The 
subject  is  then  presented  with  the  instructions  depicted  in  Figure  55.  When  ready,  each 


You  will  now  be  rating  the  quality  of  visual  displays. 

Base  your  ratings  on  the  Low  and  High  visual  displays  depicted  earlier. 

For  example,  if  the  visual  display  you  are  rating  appears  to  look 
like  that  of  the  previously  shown  Low  quality  display,  your  rating 
should  be  '1'  for  'Low'  If  the  visual  display  you  are  rating  appears 
to  be  of  better  quality  than  that  of  the  previously  shown  Low  quality 
display,  your  rating  should  be  somewhere  in  the  range  from  '2'  to  7'. 

A  total  of  9  visual  displays  will  be  presented  randomly. 

You  will  have  8  seconds  to  see  each  visual  display. 

After  seeing  the  visual  display,  you  will  be  prompted  for  your  rating. 


Press  to  Continue 


Figure  55.  Experiment  1:  Visual  Display  Rating  Instructions. 
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Figure  56.  Experiment  1:  Visual  Display  Quality  Rating  Scale. 

visual  display  is  rendered  for  eight  seconds  after  which  it  automatically  disappears,  and  a 
Java  pop-up  window  automatically  appears  to  facilitate  rating  the  visual  display  as 
depicted  in  Figure  56.  The  subject  rates  a  total  of  nine  visual-only  displays  (three  of  each 
quality,  low,  medium,  and  high  presented  in  random  order).  After  rating  the  visual-only 
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displays,  the  subject  uses  the  same  process,  as  with  the  visual  displays,  to  memorize  the 
quality  differences  between  the  lowest  and  highest  quality  auditory  displays.  The  lowest 
and  highest  quality  auditory  displays  corresponded  to  8  kHz  and  44. 1  kHz  respectively. 
The  subject  uses  the  exact  same  process,  as  with  the  visual  displays,  to  rate  nine  auditory- 
only  displays  (three  of  each  quality  presented  in  random  order)  by  using  the  auditory 
rating  scales  as  depicted  in  Figure  57.  After  rating  the  auditory  displays,  the  subject  is 
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Figure  57.  Experiment  1:  Auditory  Display  Quality  Rating  Scale. 

presented  with  instructions  on  rating  only  the  visual  quality  of  nine  combined  auditory- 
visual  displays  (the  nine  permutations  of  the  auditory  and  visual  qualities  are  partially 
counterbalanced  through  the  Latin  squares  technique)  as  depicted  in  Figure  58.  The 
subject  is  then  presented  with  instructions  on  rating  only  the  auditory  quality  of  nine 
combined  auditory-visual  displays  (the  nine  permutations  of  the  auditory  and  visual 
qualities  are  partially  counterbalanced  through  the  Latin  squares  technique)  as  depicted  in 
Figure  59.  Finally,  the  subject  is  presented  with  instructions  on  rating  18  combined 
auditory-visual  displays  as  depicted  in  Figure  60.  After  each  of  the  18  combined 
auditory-visual  displays  is  presented  (the  nine  permutations  of  the  auditory  and  visual 
qualities  are  partially  counterbalanced  through  the  Latin  squares  technique,  and  then 
presented  in  reverse  order  for  a  total  of  18  combined  auditory-visual  ratings),  the  subject 
rates  both  the  auditory  and  visual  displays  using  the  combined  auditory-visual  rating 
scale  depicted  in  Figure  61.  After  the  subject  has  completed  rating  all  of  the  displays,  the 
automated  portion  of  the  experiment  terminates.  The  subject  is  then  asked  to  complete  a 
brief  post-experiment  survey  consisting  of  13  questions.  This  survey  is  identical  to  the 
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(1 )  You  will  now  be  rating  the  VISUAL  quality  of  a  combined  audio-visual  display. 

(2)  A  total  of  9  audio-visual  displays  will  be  presented  randomly. 

(3)  Each  audio-visual  display  will  be  presented  for  8  seconds. 

(4)  After  which,  you  will  be  prompted  ONLY  for  your  VISUAL  rating. 

Press  to  Continue 


Figure  58.  Experiment  1:  Visual-Only  Rating  Instructions  When  Given  A 
Combined  Auditory-Visual  Display. 


( 1 )  You  will  now  be  rating  the  AUDIO  quality  of  a  combined  audio-visual  display 

(2)  A  total  of  9  audio-visual  displays  will  be  presented  randomly. 

(3)  Each  audio-visual  display  will  be  presented  for  8  seconds. 

(4)  After. which,  you  will  be  prompted  ONLY  for  your  AUDIO  rating. 

Press  to  Continue 


Figure  59.  Experiment  1:  Auditory-Only  Rating  Instructions  When 
Given  A  Combined  Auditory-Visual  Display. 


( 1 )  You  will  now  be  rating  the  audio  AND  visual  quality  of  a  combined  audio-visual  display. 

(2)  A  total  of  18  audio-visual  displays  will  be  presented  randomly. 

(3)  Each  audio-visual  display  will  be  presented  for  8  seconds. 

(4)  After  which,  you  will  be  prompted  for  your  audio  AND  visual  rating. 

Press  to  Continue 


Figure  60.  Experiment  1:  Combined  Auditory- Visual  Rating  Instructions. 
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Figure  61.  Experiment  1:  Combined  Auditory- Visual  Rating  Scale. 

one  used  in  the  pilot  study  as  depicted  earlier  in  Chapter  VI,  Figure  48  and  Figure  49. 
After  completing  the  post-experiment  questions,  the  subject  is  allowed  to  ask  any  overall 
questions  about  the  experiment.  The  experiment  is  then  terminated,  and  the  subject  is  free 
to  go. 

F.        CHANGES  FROM  PILOT  STUDY 


The  following  discussi6n  describes  how  the  results  from  the  pilot  study  were 
implemented  in  the  redesign  of  this  experiment  and  how  these  implemented  results 
affected  the  overall  execution  of  the  main  experiment. 

1.  Software  and  Hardware  Functionality 

Switching  to  a  new  hardware  platform  proved  to  be  extremely  reliable  and  never 
exhibited  any  problems.  Switching  to  Microsoft  Windows95  also  proved  to  be  very 
reliable  since  the  operating  system  never  once  crashed.  Eliminating  the  use  of  VRML 
also  eliminated  the  system  crashes  associated  with  the  Microsoft  Visual  C+  +  Runtime 
Library  error  number  R6025:  Pure  Virtual  Function  Call.  Furthermore,  by  using 
JavaScript  as  opposed  to  VRML,  the  combined  auditory-visual  displays  were 
automatically  synchronized  when  being  rendered.  This  eliminated  the  trial  and  error 
process  associated  with  VRML  ultimately  saving  a  lot  of  time  and  effort  during  the 


development  of  the  main  experiment,  and  thereby  better  supporting  the  portability  aspect 
of  the  experiment  for  the  eventual  goal  of  conducting  future  on-line  experiments. 

2.  Procedural  Changes 

a.  Netscape  Status  Window 

The  use  of  the  black  cloth  to  cover  Netscape's  Status  Window  on  the 
computer  monitor  was  negated  by  learning  the  ability  to  use  the  key  sequence  ctrl-alt-s  to 
toggle  the  on  and  off  the  Status  Window.  This  not  only  increased  the  professionalism  of 
the  experiment,  but  also,  albeit  small,  increased  the  size  of  the  viewing  display  area. 

b.  Rating  Scales  Default  Setting 

By  eliminating  any  default  setting  on  the  rating  scales,  the  subject's 
response  time  measurement  became  uniform  across  all  possible  ratings,  thereby  allowing 
proper  data  analysis  of  response  time. 

c.  Time  Delay  Between  Ratings 

By  eliminating  the  use  of  VRML,  the  time  required  to  load  and  unload  the 
VRML  Plug-in  was  likewise  negated.  As  a  result,  through  the  use  of  JavaScript,  there 
was  practically  no  perceivable  time  delay  between  ratings.  Given  that  the  time  between 
ratings  was  now  instantaneous,  the  overall  amount  of  time  to  complete  the  experiment 
was  significantly  reduced.  This  facilitated  adding  additional  data  collection  aspects  to  the 
experimental  design,  while  not  increasing  the  overall  duration  of  the  experiment.  As  with 
the  pilot  study,  subjects  completed  the  experiment  in  about  30  minutes. 

d.  Range  of  Rating  Scales 

Given  that  the  range  of  all  rating  scales  was  increased  from  three  to  seven 
choices,  the  floor  and  ceiling  effects  were  significantly  reduced  if  not  altogether 
eliminated.  This  increased  range  provides  the  ability  to  properly  measure  any  potential 
degrees  of  perceptual  effects  caused  by  the  various  quality  displays. 
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e.    Elimination  of  the  Matching  Problem 

The  matching  (memorization)  problem  of  the  pilot  study  was  eliminated 
by  not  requiring  the  subjects  to  memorize  the  three  low,  medium,  and  high  display 
qualities.  In  this  experiment,  the  subject  is  only  required  to  memorize  the  lowest  and 
highest  possible  quality  extremes.  During  the  rating  process,  the  subject  is  never 
reexposed  to  the  lowest  and  highest  quality  displays.  Furthermore,  the  subject  is  not 
aware  of  how  many  quality  levels  are  actually  being  presented.  Since  there  are  seven 
possible  choices  on  the  rating  scales,  not  three,  the  subject  can  only  guess  that  there  may 
be  upwards  of  seven  possible  quality  levels  for  both  the  auditory  and  visual  displays.  By 
only  requiring  the  subject  to  memorize  the  lowest  and  highest  possible  quality  extremes, 
each  subject,  in  essence,  self-calibrates  himself  or  herself,  when  rating  the  quality 
displays  that  fall  between  the  given  lowest  and  highest  qualities.  In  fact,  unbeknownst  to 
the  subject,  only  three  quality  levels:  low,  medium,  and  high,  are  presented.  Thus,  when 
rating  the  various  auditory  and  visual  displays,  the  rating  process  becomes  purely 
subjective  (perceptual)  and  not  based  on  memorizing  the  exact  quality  level  of  a 
particular  display. 

/.    Duration  of  Displays 

During  the  pilot  study,  all  displays  were  rendered  for  seven  seconds, 
however,  in  this  experiment  all  displays  were  rendered  for  eight  seconds.  The  reason  for 
increasing  the  length  of  the  displays  by  one  second  had  to  do  with  the  auditory  display 
development  for  the  follow-on  experiment,  Experiment  2:  Static  Noise.  In  this 
experiment,  which  is  described  in  the  next  chapter,  Gaussian  white  noise  level  is  the 
manipulated  auditory  display  parameter.  As  such,  a  one  half  second  fade-in  and  fade-out 
of  Gaussian  white  noise  was  added  to  the  auditory  display  to  negate  the  abrupt  onset  of 
the  rendered  Gaussian  white  noise  which  is  somewhat  shocking  and  startling  if 
unexpected.  This  startling  effect  might  cause  subjects  to  become  uneasy  or  unnerved. 
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Thus,  to  maintain  consistency  of  display  duration  among  all  experiments,  all  displays 
among  the  experiments  were  rendered  for  eight  seconds. 

G.   DATA  COLLECTION  AND  ANALYSIS 

Before  the  results  of  the  experiment  are  discussed,  it  is  important  to  understand 
the  nature  of  the  data  collected  and  the  chosen  method  of  data  analysis. 

1.  Data  Collection 

To  better  understand  the  method  of  data  analysis,  it  is  first  necessary  to 
understand  the  method  of  data  collection.  The  idea  of  the  experiment  was  to  first  capture 
the  subject's  quality  perception  of  the  visual-only  and  auditory-only  displays.  During  this 
initial  portion  of  the  experiment,  subjects  rate  nine  displays  consisting  of  three  low,  three 
medium,  and  three  high  qualities  presented  in  random  order.  The  average  rated  value  for 
each  quality  display  establishes  the  subject's  baseline  quality  rating  for  each  low-, 
medium-,  and  high-quality  display.  This  baseline  quality  rating  can  then  be  compared  to 
other  all  future  quality  ratings. 

During  the  next  portion  of  the  experiment,  subjects  rate  only  the  visual  display 
quality  of  a  combined  auditory-visual  display.  The  subject  is  presented  nine  combined 
auditory-visual  displays  corresponding  to  the  nine  permutations  formed  by  the  three 
auditory  and  three  visual  display  qualities.  The  ordering  of  these  nine  displays  is  partially 
counterbalanced  through  the  Latin  squares  technique.  As  such,  the  subject  again  rates  the 
three  low,  three  medium,  and  three  high  qualities  of  the  visual  displays.  The  average 
rated  value  for  each  quality  display  establishes  the  subject's  visual  quality  rating  for  each 
low-,  medium-,  and  high-quality  display  when  presented  in  combination  with  the  three 
quality  levels  of  the  auditory  displays. 

During  the  next  portion  of  the  experiment,  subjects  rate  only  the  auditory  display 
quality  of  a  combined  auditory-visual  display.  The  subject  is  presented  nine  combined 
auditory-visual  displays  corresponding  to  the  nine  permutations  formed  by  the  three 
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auditory  and  three  visual  display  qualities.  The  ordering  of  these  nine  displays  is  again 
partially  counterbalanced  through  the  Latin  squares  technique.  As  such,  the  subject  again 
rates  the  three  low,  three  medium,  and  three  high  qualities  of  the  auditory  displays.  The 
average  rated  value  for  each  quality  display  establishes  the  subject's  auditory  quality 
rating  for  each  low-,  medium-,  and  high-quality  display  when  presented  in  combination 
with  the  three  quality  levels  of  the  visual  displays. 

During  the  final  portion  of  the  experiment,  subjects  rate  both  the  auditory  and 
visual  display  qualities  of  a  combined  auditory-visual  display.  The  subject  is  presented  18 
combined  auditory-visual  displays  corresponding  to  1)  the  nine  permutations  formed  by 
the  three  auditory  and  three  visual  display  qualities  and  2)  the  reversal  of  the  nine 
permutations  formed  by  the  three  auditory  and  three  visual  display  qualities  all  of  which 
is  again  partially  counterbalanced  through  the  Latin  squares  technique.  As  such,  the 
subject  rates,  yet  again,  the  three  low,  three  medium,  and  three  high  qualities  of  the  visual 
displays  and  the  auditory  displays.  The  average  rated  value  for  each  quality  display 
establishes  the  subject's  visual  and  auditory  quality  rating  for  each  low-,  medium-,  and 
high-quality  display  when  having  to  rate  both  visual  and  auditory  displays 
simultaneously.  However,  to  conform  with  the  next  two  experiments,  only  the  first  nine 
of  the  18  combined  auditory-visual  displays  are  utilized  during  data  analysis. 

The  response  time,  the  time  to  rate  each  display,  was  also  collected.  However,  the 
•subject  was  not  aware  of  this  fact.  A  conscious  decision  was  made  not  to  inform  the 
subject,  to  avoid  the  possibility  of  the  subject  thinking  that  the  faster  the  response,  the 
better  the  score  as  in  some  kind  of  race.  The  idea  is  to  keep  the  subject  as  relaxed  as 
possible  so  that  the  subject's  decisions  are  based  purely  on  perception,  and  not  on  time 
(speed)  related  factors. 

2.  Data  Analysis 

As  in  any  experiment,  proper/valid  data  analysis  is  critical.  The  first  step  towards 
a  valid  data  analysis  involves  understanding  and  identifying  the  type  of  data  collected 
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such  as  nominal,  ordinal,  interval,  and  continuous.  In  this  experiment,  all  the  quality 
ratings  collected  are  considered  ordinal  data.  The  reason  for  this  is  that  the  quality  ratings 
are  derived  from  rating  scales  which  are  used  to  rank  the  quality  perception  of  the 
displays  by  giving  a  rating  on  a  scale  of  1  (lowest)  to  7  (highest).  To  be  contrasted  with 
interval  data,  the  difference  in  quality  between  the  low  and  medium  displays  is  not 
necessarily  the  same  difference  in  quality  between  the  medium-  and  high-quality 
displays.  This  is  a  very  important  point,  which  must  be  considered  when  selecting  the 
proper  data  analysis  method. 

The  underlying  distribution  of  the  data  is  another  very  important  factor  in 
deciding  how  to  analyze  the  data.  Parametric  data  analysis  can  be  used  when  assuming  a 
certain  underlying  distribution  of  the  data.  Nonparametrics  are  used  to  test  hypotheses 
about  data  from  which  the  underlying  distribution  of  data  is  not  assumed.  Thus,  because 
this  research  does  not  assume  a  certain  underlying  distribution  of  the  data,  a 
nonparametric  data  analysis  method  is  utilized.  Specifically  a  one  sample  sign  test  used  to 
compare  the  number  of  observations  above  and  below  a  certain  hypothesized  value, 
which  in  this  case  is  zero  as  described  below.  As  such,  to  answer  the  questions  outlined 
earlier  supporting  the  goal  of  this  experiment,  the  one  sample  sign  test  is  used  to 
investigate  the  following  null  hypotheses: 

1 )  The  difference  between  a)  the  visual-only  quality  rating  of  a  combined 
auditory-visual  display,  and  b)  the  baseline  rating  for  the  visual-only  quality  display  is 
zero. 

2)  The  difference  between  a)  the  auditory-only  quality  rating  of  a  combined 
auditory-visual  display,  and  b)  the  baseline  rating  for  the  auditory-only  quality  display  is 
zero. 

3)  The  difference  between  a)  the  visual  quality  rating  of  a  combined  auditory- 
visual  display  when  also  rating  the  auditory  display,  and  b)  the  baseline  rating  for  the 
visual-only  quality  display  is  zero. 
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4)  The  difference  between  a)  the  auditory  quality  rating  of  a  combined  auditory- 
visual  display  when  also  rating  the  visual  display,  and  b)  the  baseline  rating  for  the 
auditory-only  quality  display  is  zero. 

Specifically,  a  one  sample  sign  test  is  used  to  compare  the  number  of  observations 
above  and  below  the  difference  in  the  baseline  ratings  for  the  auditory-only  and  visual- 
only  quality  displays  and  1 )  the  visual-only  quality  rating  of  a  combined  auditory-visual 
display,  2)  the  auditory-only  quality  rating  of  a  combined  auditory-visual  display,  3)  the 
visual  quality  rating  of  a  combined  auditory-visual  display  when  also  rating  the  auditory 
display,  and  4)  the  auditory  quality  rating  of  a  combined  auditory-visual  display  when 
also  rating  the  visual  display.  The  data  analysis  derived  from  the  one  sample  sign  test 
forms  the  foundation  from  which  all  major  findings  in  this  research  effort  are  derived.  All 
significant  findings  of  this  research  effort  are  set  at  an  alpha  level  of  .05.  In  other  words, 
the  degree  of  confidence  supporting  all  experimental  findings  is  at  the  .05  level.  As  such, 
only  P-values  at  the  .05  level  will  be  reported  as  significant.  This  P-value  is  the 
probability  of  making  a  Type  I  Error.  In  other  words,  the  P-value  is  the  probability  of 
rejecting  the  null  hypothesis  when  in  fact  the  null  hypothesis  is  true.  As  such,  the  smaller 
the  P-value,  the  greater  the  confidence  in  rejecting  the  null  hypothesis  which  in  turn 
supports  the  alternative  hypothesis  (see  [GOOD95]  for  more  discussion  on  alpha  level, 
null  hypothesis,  alternative  hypothesis,  and  Type  I  Error). 

H.       RESULTS  AND  DISCUSSION 

The  overall  results  of  this  experiment  suggest  significant  auditory-visual  cross- 
modal  perception  phenomena  relevant  to  VE  and  multimedia  developers.. The  major 
findings  of  this  experiment  are  now  discussed. 

1.  Validity 

The  first  and  most  important  consideration  is  whether  the  quality  of  the  visual  and 
auditory  displays  developed  for  this  experiment  are  rank  ordered  by  the  subjects 
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Figure  62.  Experiment  1:  Visual-Only  Quality  Percept  Ratings. 

according  to  their  intended  rankings.  If  this  were  not  the  case,  the  validity  of  the 
experiment  would  be  jeopardized.  However,  in  looking  at  Figure  62,  one  can  see  that  the 
overall  quality  ratings  of  the  visual  displays  are  properly  rank  ordered  by  the  subjects 
according  to  this  experiment's  mtended  low-,  medium-,  and  high-quality  rankings. 
Likewise,  in  looking  at  Figure  63,  one  can  see  that  the  overall  quality  ratings  of  the 
auditory  displays  are  properly  rank  ordered  by  the  subjects  according  to  this  experiment's 
intended  low-,  medium-,  and  high-quality  rankings.  Given  that  the  data  regarding  quality 
of  all  displays  are  properly  rank  ordered,  data  analysis  with  respect  to  the  hypotheses  can 
continue. 

2.  Findings 

Figure  64  represents  the  results  of  all  one  sample  sign  tests  based  on  the  first  null 
hypothesis  which  states:  the  difference  between  a)  the  visual-only  quality  rating  of  a 
combined  auditory-visual  display,  and  b)  the  baseline  rating  for  the  visual-only  quality 
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Figure  63.  Experiment  1:  Auditory-Only  Quality  Percept  Ratings. 

display  is  zero.  As  one  can  see  from  the  results,  when  presented  a  combined  high-quality 
visual  and  high-quality  auditory  display,  when  only  asked  to  rate  the  quality  of  the  visual 
display,  a  statistically  significant  finding  at  the  .0161  level  (a  P- value  of  .0161)  suggests 
that  the  quality  perception  of  a  high-quality  visual  display  is  increased  when  coupled  with 
a  high-quality  auditory  display. 

Figure  65  represents  the  results  of  all  one  sample  sign  tests  based  on  the  second 
null  hypothesis  which  states:  the  difference  between  a)  the  auditory-only  quality  rating  of 
a  combined  auditory-visual  display,  and  b)  the  baseline  rating  for  the  auditory-only 
quality  display  is  zero.  As  one  can  see  from  the  results,  when  presented  a  combined  low- 
quality  auditory  and  high-quality  visual  display,  when  only  asked  to  rate  the  quality  of 
the  auditory  display,  a  statistically  significant  finding  at  the  .0002  level  strongly  suggests 
that  the  quality  perception  of  a  low-quality  auditory  display  is  decreased  when  coupled 
with  a  high-quality  visual  display. 
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Figure  64.  Experiment  1:  One  Sample  Sign  Tests  for  Visual-Only  Quality  Percept 

of  Combined  Auditory-Visual  Displays. 

Figure  66  represents  the  results  of  all  one  sample  sign  tests  based  on  the  third  null 
hypothesis  which  states:  the  difference  between  a)  the  visual  quality  rating  of  a  combined 
auditory-visual  display  when  also  rating  the  auditory  display,  and  b)  the  baseline  rating 
for  the  visual-only  quality  display  is  zero.  As  one  can  see  from  the  results,  there  are  no 
significant  findings  at  the  .05  level.  However,  it  is  worth  mentioning  that  when  presented 
a  combined  high-quality  visual  display  coupled  with  either  a  medium-  or  high-quality 
auditory  display,  when  asked  to  rate  both  auditory  and  visual  displays,  the  results  at  the 
.10  level  suggest  that  the  quality  perception  of  the  high-quality  visual  display  is 
increased. 
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Figure  65.  Experiment  1:  One  Sample  Sign  Tests  for  Auditory-Only  Quality 
Percept  of  Combined  Auditory-Visual  Displays. 

Figure  67  represents  the  results  of  all  one  sample  sign  tests  based  on  the  fourth 
null  hypothesis  which  states:  the  difference  between  a)  the  auditory  quality  rating  of  a 
combined  auditory-visual  display  when  also  rating  the 'visual  display,  and  b)  the  baseline 
rating  for  the  auditory-only  quality  display  is  zero.  The  results  suggest  that:  1)  when 
presented  a  combined  low-quality  auditory  and  high-quality  visual  display,  when  asked  to 
rate  both  auditory  and  visual  displays,  a  statistically  significant  finding  at  the  .0107  level 
suggests  that  the  quality  perception  of  a  low-quality  auditory  display  is  decreased  when 
coupled  with  a  high-quality  visual  display,  and  2)  when  presented  a  combined  high- 
quality  auditory  and  low-quality  visual  display,  when  asked  to  rate  both  auditory  and 
visual  displays,  a  statistically  significant  finding  at  the  .0241  level  suggests  that  the 
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Figure  66.  Experiment  1:  One  Sample  Sign  Tests  for  Visual  Quality  Percept  When 
Also  Rating  the  Auditory  Display  of  Combined  Auditory-Visual  Displays. 


quality  perception  of  a  high-quality  auditory  display  is  increased  when  coupled  with  a 
low-quality  visual  display. 

In  terms  of  response  times.  Figure  68  represents  the  average  visual  quality  rating 
response  times  of  a  combined  auditory-visual  display,  when  only  asked  to  rate  the  quality 
of  the  visual  display.  Figure  69  represents  the  average  auditory  quality  rating  response 
times  of  a  combined  auditory-visual  display,  when  only  asked  to  rate  the  quality  of  the 
auditory  display.  Figure  70  represents  the  average  combined  auditory  and  visual  quality 


129 


One -San 
Hypothe 

#  Obs.  > 
»  Obs.  < 

#  Obs.  = 
P-Value 

One-San 
Hypothe 

#  Obs.  > 

#  Obs.  < 

#  Obs.  = 
P-Value 

One-San 
Hypothe 

#  Obs.  > 

#  Obs.  < 

#  Obs.  = 
P-Value 

i  pie  Sign  Test 
sized  Value:  0 

or  A2  V2  CAV  Dill           One-Sample  Sign  Test  for  A2  V4  CAV  Did           One-Sample  Sign  Test  tor  A2  V6  CAV  Dilt 
Hypothesized  ValueO                                                    Hypothesized  Value:0 

Hyp   Value 
Hyp.  Value 
Hyp.  Value 

1  7 

*  Obs.  >  Hyp.  Value 

*  Obs   <  Hyp.  Value 
»  Obs.  =  Hyp.  Value 
P-Value 

16 

»  Obs.  >  Hyp.  Value 
#  Obs.  <  Hyp.  Value 
n  Obs.  =  Hyp.  Value 
P-Value 

15 

19 

. 

4 

1 

5 

8601 

7359 

.0107 

i  pie  Sign  Test  1 
sized  Value:  0 

or  A4  V2CAV  Dill          One-Sam  pic  Sign  Tesl  lor  A4  V4  CAV  Dill           One-Sample  Sign  Tesl  for  A4  V6  CAV  Dili 
Hypothesized  Value:0                                                Hypothesized  Value:  0 

Hyp.  Value 
Hyp.  Value 
Hyp.  Value 

10 

«  Obs.  >  Hyp.  Value 

#  Obs.  <  Hyp.  Value 

#  Obs.  =  Hyp.  Value 
P-Value 

12 

»  Obs.  >  Hyp.  Value 

*  Obs.  <  Hyp.  Value 

#  Obs.  =  Hyp.  Value 
P-Value 

16 

14 

20 

16 

i 

4 

4 

.8555 

.2153 

>  9999 

pie  Sign  Te 
sized  Value: 

si  1 
0 

or  A6  V2  CAV  Dill          One -Sam  pie  Sign  Test  lor  A6  V4  CAV  Dilt          One -Sam  pie  Sign  Test  tor  A6  V6  CAV  Dilt 

Hypothesized  Value    0                                                 Hypothesized  Value:0 

Hyp.  Value 
Hyp  Value 
Hyp,  Value 

21 

#  Obs.  >  Hyp.  Value 

#  Obs.  <  Hyp.  Value 

#  Obs.  =  Hyp.  Value 
P-Value 

16 

#  Obs.  >  Hyp.  Value 

#  Obs.  <  Hyp.  Value 

#  Obs.  =  Hyp.  value 
P-Value 

14 

6 

13 

16 

7 

7 

6 

0241 

.71  1  1 

A2V2  CA 
A2V4  CA 
A2V6  CA 
A4V2  CA 
A4V4  CA 
A4V6  CA 
A6V2  CA 
A6V4  CA 
A6V6  CA' 

V 

V 

V- 

V 

V 

V 

v  = 
v  = 
i/  = 

=  Low-Quality  Auditory  Percept  of  Combined  Low-Auditory  and  Low-Visual  Quality  Display 
=  Low-Quality  Auditory  Percept  of  Combined  Low-Auditory  and  Med-Visual  Quality  Display 
=  Low-Quality  Auditory  Percept  of  Combined  Low-Auditory  and  High-Visual  Quality  Display 
=  Med-Quality  Auditory  Percept  of  Combined  Med-Auditory  and  Low-Visual  Quality  Display 
=  Med-Quality  Auditory  Percept  of  Combined  Med-Auditory  and  Med-Visual  Quality  Display 
=  Med-Quality  Auditory  Percept  of  Combined  Med-Auditory  and  High-Visual  Quality  Display 
=  High-Quality  Auditory  Percept  of  Combined  High-Auditory  and  Low-Visual  Quality  Display 
=  High-Quality  Auditory  Percept  of  Combined  High-Auditory  and  Med-Visual  Quality  Display 
=  High-Quality  Auditory  Percept  of  Combined  High-Auditory  and  High-Visual  Quality  Display 

Figure  67.  Experiment  1:  One  Sample  Sign  Tests  for  Auditory  Quality  Percept 
When  Also  Rating  the  Visual  Display  of  Combined  Auditory-Visual  Displays. 
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Figure  68.  Experiment  1:  Visual-Only  Quality  Rating  Response  Times  of  a 
Combined  Auditory-Visual  Display. 


rating  response  times  of  a  combined  auditory-visual  display,  when  asked  to  rate  both  the 
auditory  and  visual  displays. 

In  looking  at  the  results  of  the  response  times,  one  can  see  various  trends  based  on 
a  particular  auditory-visual  quality  combination.  However,  several  factors  limit  the  ability 
to  correctly  analyze  these  temporal  results  in  any  statistically  valid  manner.  These  factors 
are  discussed  in  the  last  chapter.  Nevertheless,  one  key  observation  is  worth  mentioning. 
Nevertheless,  the  response  time  to  rate  the  visual-only  display  of  a  combined  auditory- 
visual  display  exhibited  the  only  occasion  in  the  entire  experiment  where  gender  seems  to 
be  a  factor.  In  looking  at  Figure  71,  it  is  apparent  in  every  condition,  that  females  need 
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Figure  69.  Experiment  1:  Auditory-Only  Quality  Rating  Response  Times  of  a 

Combined  Auditory-Visual  Display. 


more  time  than  males  to  rate  the  visual  displays.  The  reason  for  this  is  not  known,  but 
does  suggest  that  it  might  be  harder  for  females  to  filter  out  the  auditory  information 
while  trying  to  attend  only  to  the  visual  display.  Another  reason  might  be  a  result  of  the 
competitive  nature  of  males.  Specifically,  males  might  have  been  more  prone  to  answer 
as  quickly  as  possible;  whereas,  females  simply  took  as  much  time  as  they  felt  they 
needed. 

In  terms  of  the  post-experiment  questions.  Figure  72  represents  the  subject's 
opinion  on  1)  how  easy  or  difficult  it  was  to  determine  the  quality  of  the  various  displays, 
and  2)  if  less  or  more  time  was  needed  to  adequately  rate  the  various  displays.  Keeping  in 
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Figure  70.  Experiment  1:  Response  Times  of  Both  Auditory  and  Visual 
Displays  of  a  Combined  Auditory-Visual  Display. 


mind  that  subjects  used  a  Likert  rating  scale  ranging  from  1  to  7  (4  being  neutral)  to  rate 
their  opinions,  the  results  indicate  that  determining  the  quality  of  both  auditory  and  visual 
displays  of  a  combined  auditory-visual  display  proved  to  be  more  difficult  than 
determining  the  quality  of  either  auditory  or  visual  display  presented  either  alone  or  in 
combination.  Furthermore,  the  results  indicate  that  eight  seconds  was  an  adequate  amount 
of  time  to  rate  the  visual-only  and  auditory  displays,  but  that  slightly  more  than  eight 
seconds  was  desired  when  rating  the  combined  auditory-visual  displays. 
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Figure  71.  Experiment  1:  Comparison  of  Male  and  Female  Response  Times  When 
Rating  a  Visual-Only  Display  of  a  Combined  Auditory-Visual  Display. 


Finally,  the  remaining  questions  of  the  post-experiment  survey  reveal  that  31  of 
the  36  subjects  (86.1%)  focused  on  alphanumerics  to  determine  the  quality  of  the  visual 
displays,  and  that  20  of  the  36  subjects  (55.5%)  felt  that  they  were  mentally  overloaded 
when  having  to  rate  both  auditory  and  visual  displays  simultaneously.  Some  very 
interesting  observations  were  also  observed  concerning  the  descriptions  subjects  used  to 
determine  the  quality  of  the  various  displays.  These  observations  are  outlined  in  the  final 
chapter. 
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QI  =  How  easy  or  difficult  was  is  to  determine  the  quality  of  the  visual-only  displays9 

Q2  =  How  easy  or  difficult  was  is  to  determine  the  quality  of  the  auditory-only  displays 7 

Q3  =  How  easy  or  difficult  was  is  to  determine  the  visual  quality  of  the  auditory-visual  displays1 

Q4  =  How  easy  or  difficult  was  is  to  determine  the  auditory  quality  of  the  auditory-visual  displays? 

Q5  =  How  easy  or  difficult  was  to  determine  both  the  auditory  and  visual  qualities  of  the  auditory-visual  displays? 

Q6  =  Would  you  have  liked  less  or  more  time  to  view  the  visual-only  displays? 

Q7  =  Would  you  have  liked  less  or  more  time  to  hear  the  auditory-only  displays? 

Q8  =  Would  you  have  liked  less  or  more  time  to  hear-view  tin.  combined  auditory-visual  displays? 


Figure  72.  Experiment  1:  Post-Experiment  Questions  1-8. 

I.         SUMMARY  AND  CONCLUSIONS 

Overall  the  findings  suggest  that  whether  asked  to  specifically  attend  to  both 
auditory  and  visual  modalities,  or  asked  to  attend  to  only  one  modality,  both  similar  and 
dissimilar  cross-modal  auditory-visual  perception  phenomena  exist.  These  findings 
suggest  that  when  manipulating  visual  display  pixel  resolution  and  auditory  display 
sampling  frequency: 

1 )  When  attending  only  to  the  visual  modality  or  attending  to  both  auditory  and 
visual  modalities,  a  high-quality  visual  display  coupled  with  a  high-quality  auditory 
display  causes  an  increase  in  the  perception  of  visual  display  quality  relative  to 
established  baseline  conditions  derived  from  visual-only  quality  perception  evaluations. 

2)  When  attending  only  to  the  auditory  modality  or  attending  to  both  auditory  and 
visual  modalities,  a  low-quality  auditory  display  coupled  with  a  high-quality  visual 
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display  causes  a  decrease  in  the  perception  of  auditory  display  quality  relative  to 
established  baseline  conditions  derived  from  auditory-only  quality  perception 
evaluations. 

3)  When  attending  to  both  auditory  and  visual  modalities,  a  high-quality  auditory 
display  coupled  with  a  low-quality  visual  display  causes  an  increase  in  the  perception  of 
auditory  display  quality  relative  to  established  baseline  conditions  derived  from  auditory- 
only  quality  perception  evaluations. 

However,  would  the  same  findings  hold  true  when  manipulating  other  quality 
parameters?  As  such,  the  next  chapter  investigates  whether  manipulating  visual  display 
Gaussian  white  noise  level  and  auditory  display  Gaussian  white  noise  level  produce  the 
same  results. 
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VIII.  EXPERIMENT  2:  STATIC  NOISE 

A.  INTRODUCTION 

Experiment  2:  Static  Noise  investigates  the  perceptual  effects  from  manipulating 
visual  display  Gaussian  noise  level  and  auditory  display  Gaussian  noise  level.  The  visual 
display  consists  of  a  static  image  of  a  radio  depicted  in  Chapter  IV,  Figure  32,  and  the 
auditory  display  is  a  selection  of  music.  As  in  the  previous  experiment,  the  goal  of  this 
experiment  is  to  answer  the  following  questions: 

1)  Does  a  high-quality  auditory  display  coupled  "with  a  low-quality  visual  display 
cause  a  decrease/increase  in  the  perception  of  audio  quality  and/or  an  increase/decrease  in 
the  perception  of  visual  quality  relative  to  established  baseline  conditions  derived  from 
auditory-only  and  visual-only  quality  perception  evaluations? 

2)  Does  a  low-quality  auditory  display  coupled  with  a  high-quality  visual  display 
cause  an  increase/decrease  in  the  perception  of  audio  quality  and/or  a  decrease/increase  in 
the  perception  of  visual  quality  relative  to  established  baseline  conditions  derived  from 
auditory-only  and  visual-only  quality  perception  evaluations? 

3)  Does  a  low-quality  auditory  display  coupled  with  a  low-quality  visual  display 
cause  a  decrease/increase  in  the  perception  of  audio  quality  and/or  a  decrease/increase  in 
the  perception  of  visual  quality  relative  to  established  baseline  conditions  derived  from 
auditory-only  and  visual-only  quality  perception  evaluations? 

4)  Does  a  high-quality  auditory  display  coupled  with  a  high-quality  visual  display 
cause  an  increase/decrease  in  the  perception  of  audio  quality  and/or  an  increase/decrease 
in  the  perception  of  visual  quality  relative  to  established  baseline  conditions  derived  from 
auditory-only  and  visual-only  quality  perception  evaluations? 

B.  LOCATION 

Because  the  building  containing  the  room  of  the  first  experiment  was  undergoing 
electrical  rewiring  resulting  in  many  power  outages,  the  location  of  this  experiment  was 
moved  to  a  different  building.  Nevertheless,  all  testing  sessions  of  Experiment  2:  Static 
Noise  were  conducted  in  a  similar  isolated  room  under  the  same  ambient  conditions.  The 
dimensions  of  the  room  were  slightly  smaller  than  that  of  the  first  experiment  at 
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approximately  10  feet  x  10  feet.  Before  each  session,  1)  all  nonessential  electronic 
equipment  was  turned  off,  2)  telephones  were  unplugged,  3)  windows  were  closed  and 
covered  with  blackout  cloth,  4)  the  main  overhead  lights  were  turned  off.  5)  a  60  watt 
incandescent  desk  lamp  was  turned  on  behind  the  computer  monitor  to  eliminate  any 
glare.  6)  the  door  to  the  room  was  closed.  7)  a  Do  Not  Disturb  Sign  was  placed  on  the 
outside  of  the  door,  and  8)  the  subject  was  asked  to  turn  off  any  audible  pagers,  mobile 
phones,  and/or  watches. 

C.  PARTICIPANTS 

A  total  of  36  volunteer  participants  (27  Male,  9  Female)  comprised  from  the 
students,  faculty,  staff,  and  guests  of  NPS  served  as  subjects.  Based  on  the  limited  gender 
findings  of  the  first  experiment  (Experiment  1 :  Static  Resolution),  the  number  of  male 
and  female  subjects  in  this  experiment  is  not  balanced.  The  average  age  of  the  subjects  is 
36.1  years  ranging  in  age  from  19  to  54.  As  with  the  previous  experiment,  all  subjects 
were  required  to  have  20/20  or  corrected  20/20  vision  and  normal  hearing.  Because  the 
experiment  did  not  involve  precise  measurements  of  Gaussian  noise  levels,  a  vision  and 
hearing  test  were  not  needed.  Before  conducting  the  experiment,  each  subject  was  asked, 
as  part  of  a  voluntary  consent  form,  if  he  or  she  met  the  vision  and  hearing  requirements. 

D.  APPARATUS 

The  apparatus  used  in  this  experiment  is  identical  to  that  of  Experiment  1 :  Static 
Resolution.  See  Chapter  VII,  Section  D. 

E.  PROCEDURE 

Except  for  a  few  changes  which  will  be  discussed,  the  procedure  of  this 
experiment  is  identical  to  that  of  the  first  experiment,  Experiment  1 :  Static  Resolution. 
The  experiment  involved  a  3x3  factorial  within  subjects  design.  The  two  independent 
variables  are  visual  and  audio  display  quality.  The  two  dependent  variables  are  the 
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corresponding  quality  perception  of  the  auditory  and  visual  displays.  The  development 
process  of  the  visual  displays  was  identical  to  that  of  the  first  experiment,  except  that 
Gaussian  white  noise  levels  were  manipulated  with  Adobe  Photoshop  [ADOB98]  as 
opposed  to  pixel  resolution.  The  three  levels  of  the  visual  quality  independent  variable 
consist  of  low-,  medium-,  and  high-quality  visual  displays  of  the  radio  image  depicted  in 
Chapter  IV,  Figure  32,  having  added  Gaussian  noise  level  amounts  of  24,  18,  and  12, 
respectively.  The  number  corresponding  to  the  amount  of  Gaussian  noise  is  a  relative 
number  based  on  a  scale  of  1  to  999  that  is  used  in  Adobe  Photoshop.  Likewise,  the 
development  process  of  the  auditory  displays  was  identical  to  that  of  the  first  experiment, 
except  that  Gaussian  noise  levels  of  the  original  music  selection  at  44.1  kHz,  were 
manipulated  with  Sonic  Foundary's  SoundForge  [SONI98]  as  opposed  to  sampling 
frequency.  The  resulting  three  levels  of  the  auditory  quality  independent  variable  consist 
of  low-,  medium-,  and  high-quality  auditory  displays  of  the  same  music  selection 
presented  monophonically  at  44. 1  kHz  having  mixed  in  Gaussian  noise  level  amounts  of 
31  percent,  23  percent,  and  15  percent,  respectively.  As  such,  both  the  visual  and  auditory 
display  parameters  manipulated  are  Gaussian  noise  level.  During  the  experiment,  which 
lasts  approximately  30  minutes,  each  subject  wears  headphones  and  sits  in  front  of  a  20- 
inch  computer  display  monitor.  The  task  of  the  subject  is  to  rate  the  perceived  quality  of 
audio  only,  visual-only,  and  audio-visual  displays  via  Likert  rating  scales  ranging  from  1 
(low)  to  7  (high). 

The  lowest-  and  highest-quality  auditory  displays  in  which  the  subjects  were 
supposed  to  memorize  during  the  self-calibration  phase  corresponded  to  the  music 
selection  at  44.1  kHz,  having  mixed  in  Gaussian  noise  level  amounts  of  45  percent  and 
10  percent,  respectively.  The  lowest-  and  highest-quality  visual  displays  in  which  the 
subjects  were  supposed  to  memorize  during  the  self-calibration  phase  are  depicted  in 
Figure  73  and  Figure  74,  respectively.  The  low-quality  visual  display  has  an  added 
Gaussian  noise  level  amount  of  45;  whereas  the  high-quality  visual  display  has  an  added 
Gaussian  noise  level  amount  of  10.  Again,  it  is  important  to  remember  that  the  original 
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Figure  73.  Experiment  2:  Low-Quality  Visual  Display  Familiarization. 


&r  An  Experiment  -  Netscape 


Fie    £(ft    ¥«*#    £o    £ommuruc«toi    Help 


Figure  74.  Experiment  2:  High-Quality  Visual  Display  Familiarization. 
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You  will  now  be  presented  two  Visual  Displays. 

One  display  is  of  'Low  Quality'  and  the  other  is  of  'High  Quality' 

To  see  the  'Low  Quality'  display,  click  on  the  'LOW  QUALITY'  link 

To  see  the  'High  Quality'  display,  click  on  the  'HIGH  QUALITY'  link 

You  can  view  either  display  as  long  as  you  like 

You  can  go  back  and  forth  between  the  displays  as  many  times  as  you  like 

Later  in  this  experiment,  you  will  be  tested  on  your  ability  to  correctly 
identify  various  quality  levels  of  visual  displays.  Therefore,  at  this  time 
you  should  try  your  best  to  memorize  what  is  considered  to  be  a  'Low  Quality 
display,  and  what  is  considered  to  be  a  'High  Quality  display. 

When  you  are  ready  to  rate  the  quality  of  visual  displays,  click  on  the  'FINISHED'  link. 

Press  to  Continue  I 


Figure  75.  Experiment  2:  Visual  Display  Instructions. 

displays  were  depicted  in  color,  and  that  the  actual  Gaussian  noise  level  experienced  by 
the  subject  can  only  be  viewed  on  the  actual  20-inch  computer  monitor.  However,  the 
low-  and  high-quality  displays  depicted  in  Figure  73  and  Figure  74  are  fairly  good 
representations  of  the  quality  difference  between  the  actual  displays  used  in  the 
experiment.  Besides  the  different  auditory  and  visual  stimuli  utilized,  the  procedure 
continues  exactly  as  in  the  previous  experiment  except  for  1)  minor  changes  in  the 
readability  of  instructions,  2)  an  increase  in  the  number  of  visual-only  and  auditory-only 
quality  ratings,  and  3)  a  decrease  from  18  to  nine  combined  auditory-visual  ratings  during 
the  final  portion  of  the  experiment.  These  changes  are  now  discussed. 

Based  on  the  subjects'  comments  on  the  previous  experiment,  the  readability  of 
the  instructions  was  enhanced  by  adding  more  white  space.  An  example  of  this  is 
comparing  the  instructions  from  the  previous  experiment  as  depicted  in  Chapter  VII, 
Figure  52     with  the  revised  instructions  as  depicted  in  Figure  75.  Note  that  the  content  of 
the  instructions  was  not  changed  only  the  readability  was  enhanced  through  increased  use 
of  white  space. 
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In  order  to  establish  a  stronger  confidence  in  the  baseline  ratings  for  the  visual- 
only  and  auditory-only  displays,  the  number  of  quality  ratings  made  during  the  visual- 
only  and  auditory-only  portions  was  increased  from  9  to  12.  However,  to  conform  with 
the  data  analysis  of  the  previous  experiment,  the  first  three  ratings,  consisting  of  one  low- 
,  medium-,  and  high-quality  were  disregarded.  The  idea  was  to  allow  the  subject, 
unknowingly,  to  see/hear  the  three  quality  levels  one  time  before  having  to  make  a  rating. 
The  baseline  ratings  were  still  based  on  an  average  of  three  quality  ratings  to  conform 
with  the  data  analysis  of  the  previous,  and  the  only  result  is  an  increase  in  the  confidence 
of  the  baseline  ratings  and  not  an  increase  of  the  number  of  stimuli  used  to  average  the 
baseline  ratings. 

The  final  portion  of  the  experiment  was  also  changed  based  on  subjects' 
comments  from  the  previous  experiment.  Subjects  felt  that  rating  18  combined  auditory- 
visual  displays  was  somewhat  long  and  tiresome.  As  a  result,  the  number  of  combined 
auditory-visual  display  ratings  during  the  final  portion  of  the  experiment  was  decreased 
from  18  to  9  in  an  effort  to  maintain  a  higher  level  of  subject  interest. 

Again,  other  than  the  above  mentioned  changes,  the  procedure  of  this  experiment 
is  identical  to  that  of  the  previous  experiment.  As  a  result,  the  same  data  collection 
factors  and  data  analysis  are  used  to  examine  the  results. 

F.        RESULTS  AND  DISCUSSION 

As  with  the  previous  experiment,  the  overall  results  of  this  experiment  suggest 
significant  auditory-visual  cross-modal  perception  phenomena  relevant  to  VE  and 
multimedia  developers.  The  major  findings  of  this  experiment  are  now  discussed. 

1.  Validity 

The  first  and  most  important  consideration  is  whether  the  quality  of  the  visual  and 
auditory  displays  developed  for  this  experiment  are  rank  ordered  by  the  subjects 
according  to  their  intended  rankings.  If  this  were  not  the  case,  the  validity  of  the 
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Error  Bars:  *  1  Standard  Error(s) 
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V2  Only  Percept       V4  Only  Percept       V6  Only  Percept 

V2  =  Low-Quality  Visual-Only  Percept 
V4  =  Med-Quality  Visual-Only  Percept 
V6  =  High-Quality  Visual-Only  Percept 


Figure  76.  Experiment  2:  Visual-Only  Quality  Percept  Ratings. 

experiment  would  be  jeopardized.  However,  in  looking  at  Figure  76,  one  can  see  that  the 
overall  quality  ratings  of  the  visual  displays  are  properly  rank  ordered  by  the  subjects 
according  to  this  experiment's  intended  low-,  medium-,  and  high-quality  rankings. 
Likewise,  in  looking  at  Figure  77,  one  can  see  that  the  overall  quality  ratings  of  the 
auditory  displays  are  properly  rank  ordered  by  the  subjects  according  to  this  experiment's 
intended  low-,  medium-,  and  high-quality  rankings.  Given  that  the  data  regarding  quality 
of  all  displays  are  properly  rank  ordered,  data  analysis  with  respect  to  the  hypotheses  can 
continue. 

2.  Findings 

Figure  78  represents  the  results  of  all  one  sample  sign  tests  based  on  the  first  null 
hypothesis  which  states:  the  difference  between  a)  the  visual-only  quality  rating  of  a 
combined  auditory-visual  display,  and  b)  the  baseline  rating  for  the  visual-only  quality 
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Figure  77.  Experiment  2:  Auditory-Only  Quality  Percept  Ratings. 

display  is  zero.  As  one  can  see  from  the  results,  there  are  no  statistically  significant 
findings  in  any  of  the  quality  combinations. 

Figure  79  represents  the  results  of  all  one  sample  sign  tests  based  on  the  second 
null  hypothesis  which  states:  the  difference  between  a)  the  auditory-only  quality  rating  of 
a  combined  auditory-visual  display,  and  b)  the  baseline  rating  for  the  auditory-only 
quality  display  is  zero.  As  one  can  see  from  the  results,  1)  when  presented  a  combined 
low-quality  auditory  and  high-quality  visual  display,  when  only  asked  to  rate  the  quality 
of  the  auditory  display,  a  statistically  significant  finding  at  the  .0290  level  suggests  that 
the  quality  perception  of  a  low-quality  auditory  display  js  decreased  when  coupled  with  a 
high-quality  visual  display,  and  2)  when  presented  a  combined  high-quality  auditory  and 
high-quality  visual  display,  when  only  asked  to  rate  the  quality  of  the  auditory  display,  a 
statistically  significant  finding  at  the  .0243  level  suggests  that  the  quality  perception  of  a 
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Figure  78.  Experiment  2:  One  Sample  Sign  Tests  for  Visual-Only  Quality  Percept 

of  Combined  Auditory-Visual  Displays. 

high-quality  auditory  display  is  increased  when  coupled  with  a  high-quality  visual 
display. 

Figure  80  represents  the  results  of  all  one  sample  sign  tests  based  on  the  third  null 
hypothesis  which  states:  the  difference  between  a)  the  visual  quality  rating  of  a  combined 
auditory-visual  display  when  also  rating  the  auditory  display,  and  b)  the  baseline  rating 
for  the  visual-only  quality  display  is  zero.  As  one  can  see  from  the  results,  there  are  no 
significant  findings  at  the  .05  level.  However  it  is  worth  mentioning  that  there  are  three 
findings  at  the  .  10  level  which  one  can  see  from  the  figure. 

Figure  81  represents  the  results  of  all  one  sample  sign  tests  based  on  the  fourth 
null  hypothesis  which  states:  the  difference  between  a)  the  auditory  quality  rating  of  a 
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Figure  79.  Experiment  2:  One  Sample  Sign  Tests  for  Auditory-Only  Quality 
Percept  of  Combined  Auditory-Visual  Displays. 

combined  auditory-visual  display  when  also  rating  the  visual  display,  and  b)  the  baseline 
rating  for  the  auditory-only  quality  display  is  zero.  The  results  suggest  that:  1)  when 
presented  a  combined  medium-quality  auditory  and  medium-quality  visual  display,  when 
asked  to  rate  both  auditory  and  visual  displays,  a  statistically  significant  finding  at  the 
.0029  level  suggests  that  the  quality  perception  of  a  medium-quality  auditory  display  is 
increased  when  coupled  with  a  medium-quality  visual  display,  and  2)  when  presented  a 
combined  high-quality  auditory  and  high-quality  visual  display,  when  asked  to  rate  both 
auditory  and  visual  displays,  a  statistically  significant  finding  at  the  .0294  level  suggests 
that  the  quality  perception  of  a  high-quality  auditory  display  is  increased  when  coupled 
with  a  high-quality  visual  display. 
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Figure  80.  Experiment  2:  One  Sample  Sign  Tests  for  Visual  Quality  Percept  When 
Also  Rating  the  Auditory  Display  of  Combined  Auditory-Visual  Displays. 


In  terms  of  response  times,  Figure  82  represents  the  average  visual  quality  rating 
response  times  of  a  combined  auditory-visual  display,  when  only  asked  to  rate  the  quality 
of  the  visual  display.  Figure  83  represents  the  average  auditory  quality  rating  response 
times  of  a  combined  auditory-visual  display,  when  only  asked  to  rate  the  quality  of  the 
auditory  display.  Figure  84  represents  the  average  combined  auditory  and  visual  quality 
rating  response  times  of  a  combined  auditory-visual  display,  when  asked  to  rate  both  the 
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Figure  81.  Experiment  2:  One  Sample  Sign  Tests  for  Auditory  Quality  Percept 
When  Also  Rating  the  Visual  Display  of  Combined  Auditory-Visual  Displays. 
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Figure  82.  Experiment  2:  Visual-Only  Quality  Rating  Response  Times  of  a 
Combined  Auditory-Visual  Display. 


'Auditory  and  visual  displays.  In  looking  at  the  results  of  the  response  times,  one  can  see 
various  trends  based  on  a  particular  auditory-visual  quality  combination.  However, 
several  factors  limit  the  ability  to  correctly  analyze  these  temporal  results  in  any 
statistically  valid  manner.  These  factors  are  discussed  in  the  last  chapter. 

In  terms  of  the  post-experiment  questions,  Figure  85  represents  the  subject's 
opinion  on  1 )  how  easy  or  difficult  it  was  to  determine  the  quality  of  the  various  displays, 
and  2)  if  less  or  more  time  was  needed  to  adequately  rate  the  various  displays.  Keeping  in 
mind  that  subjects  used  a  Likert  rating  scale  ranging  from  1  to  7  (4  being  neutral)  to  rate 
their  opinions,  the  results  indicate  that  determining  the  quality  of  both  auditory  and  visual 
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Figure  83.  Experiment  2:  Auditory-Only  Quality  Rating  Response  Times  of  a 

Combined  Auditory-Visual  Display. 

displays  of  a  combined  auditory-visual  display  proved  to  be  more  difficult  than 
determining  the  quality  of  either  auditory  or  visual  display  presented  either  alone  or  in 
combination.  Furthermore,  the  results  indicate  that  eight  seconds  was  an  adequate  amount 
of  time  to  rate  the  visual-only  and  auditory  displays,  but  that  slightly  more  than  eight 
seconds  was  desired  when  rating  the  combined  auditory-visual  displays. 

Finally,  the  remaining  questions  of  the  post-experiment  survey  reveal  that  29  of 
the  36  subjects  (80.1%)  focused  on  alphanumerics  to  determine  the  quality  of  the  visual 
displays,  and  that  only  7  of  the  36  subjects  (19.4%)  felt  that  they  were  mentally 
overloaded  when  having  to  rate  both  auditory  and  visual  displays  simultaneously.  As  in 
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Figure  84.  Experiment  2:  Response  Times  of  Both  Auditory  and  Visual 
Displays  of  a  Combined  Auditory-Visual  Display. 


the  previous  experiment,  some  very  interesting  observations  were  also  observed 
concerning  the  descriptions  that  the  subjects  used  to  determine  the  quality  of  the  various 
displays.  These  observations  are  outlined  in  the  final  chapter. 

G.       SUMMARY  AND  CONCLUSIONS 

Overall  the  findings  suggest  that  whether  asked  to  specifically  attend  to  both 
auditory  and  visual  modalities,  or  asked  to  attend  to  only  one  modality,  both  similar  and 
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QI  =  How  easy  or  difficult  was  is  to  determine  the  quality  of  the  visual-only  displays? 

Q2  =  How  easy  or  difficult  was  is  to  determine  the  quality  of  the  auditory -only  displays? 

Q3  =  How  easy  or  difficult  was  is  to  determine  the  visual  quality  of  the  auditory-visual  displays? 

Q4  =  How  easy  or  difficult  was  is  to  determine  the  auditory  quality  of  the  auditory-visual  displays? 

Q5  =  How  easy  or  difficult  was  to  determine  both  the  auditory  and  visual  qualities  of  the  auditory-visual  displays? 

Q6  =  Would  you  have  liked  less  or  more  time  to  view  the  visual-only  displays? 

Q7  =  Would  you  have  liked  less  or  more  time  to  hear  the  auditory-only  displays? 

Q8  =  Would  you  have  liked  less  or  more  time  to  hear-view  the  combined  auditory-visual  displays? 


Figure  85.  Experiment  2:  Post-Experiment  Questions  1-8. 


dissimilar  cross-modal  auditory-visual  perception  phenomena  exist.  These  findings 
suggest  that  when  manipulating  both  visual  and  auditory  display  Gaussian  noise  level: 

1)  When  attending  only  to  the  auditory  modality,  a  low-quality  auditory  display 
coupled  with  a  high-quality  visual  display  causes  a  decrease  in  the  perception  of  auditory 
quality  relative  to  established  baseline  conditions  derived  from  auditory-only  quality 
perception  evaluations. 

2)  When  attending  only  to  the  auditory  modality,  or  attending  to  both  auditory  and 
visual  modalities,  a  high-quality  auditory  display  coupled  with  a  high-quality  visual 
display  causes  an  increase  in  the  perception  of  visual  quality  relative  to  established 
baseline  conditions  derived  from  visual-only  quality  perception  evaluations. 

3)  When  attending  to  both  auditory  and  visual  modalities,  a  medium-quality  auditory 
display  coupled  with  a  medium-quality  visual  display  causes  an  increase  in  the  perception 
of  auditory  quality  relative  to  established  baseline  conditions  derived  from  auditory-only 
quality  perception  evaluations. 
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Thus  far,  the  first  two  experiments  have  used  a  perceptually  tight  coupling  of 
radio  and  music  to  represent  the  visual  and  auditory  displays.  However,  might  the  same 
findings  hold  true  if  the  auditory  and  visual  displays  were  not  semantically  associated 
with  each  other?  The  next  chapter  describes  the  final  experiment  of  this  research  effort 
which  investigates  the  answer  to  this  question. 
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IX.  EXPERIMENT  3:  STATIC  RESOLUTION 
NONALPHANUMERIC 

A.  INTRODUCTION 

Experiment  3:  Static  Resolution  NonAlphanumeric  is  designed  to  investigate  the 
perceptual  effects  from  manipulating  visual  display  pixel  resolution  and  auditory  display 
sampling  frequency.  The  visual  display  consists  of  the  aforementioned  fruit-flower  scene 
depicted  in  Chapter  IV,  Figure  33  and  the  auditory  display  is  a  selection  of  music.  As  in 
the  previous  experiments,  the  goal  of  this  experiment  is  to  investigate  the  following 
questions: 

1)  Does  a  high-quality  auditory  display  coupled  with  a  low-quality  visual  display 
cause  a  decrease/increase  in  the  perception  of  audio  quality  and/or  an  increase/decrease  in 
the  perception  of  visual  quality  relative  to  established  baseline  conditions  derived  from 
auditory-only  and  visual-only  quality  perception  evaluations? 

2)  Does  a  low-quality  auditory  display  coupled  with  a  high-quality  visual  display 
cause  an  increase/decrease  in  the  perception  of  audio  quality  and/or  a  decrease/increase  in 
the  perception  of  visual  quality  relative  to  established  baseline  conditions  derived  from 
auditory-only  and  visual-only  quality  perception  evaluations? 

3)  Does  a  low-quality  auditory  display  coupled  with  a  low-quality  visual  display 
cause  a  decrease/increase  in  the  perception  of  audio  quality  and/or  a  decrease/increase  in 
the  perception  of  visual  quality  relative  to  established  baseline  conditions  derived  from 
auditory-only  and  visual-only  quality  perception  evaluations? 

4)  Does  a  high-quality  auditory  display  coupled  with  a  high-quality  visual  display 
cause  an  increase/decrease  in  the  perception  of  audio  quality  and/or  an  increase/decrease 
in  the  perception  of  visual  quality  relative  to  established  baseline  conditions  derived  from 
auditory-only  and  visual-only  quality  perception  evaluations? 

B.  LOCATION 

The  location  and  ambient  conditions  for  this  experiment  were  identical  to  that  of 
the  previous  experiment.  Experiment  2:  Static  Noise.  See  Chapter  VIII,  Section  B. 
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C.  PARTICIPANTS 

A  total  of  36  volunteer  participants  (14  Male,  22  Female)  comprised  from  the 
students,  faculty,  staff,  and  guests  of  NPS  served  as  subjects.  Again,  based  on  the  limited 
gender  findings  of  the  first  two  experiments,  the  number  of  male  and  female  subjects  in 
this  experiment  is  not  balanced.  The  average  age  of  the  subjects  is  35.5  years  ranging  in 
age  from  1 1  to  59  (two  female  subjects  did  not  give  their  age).  As  with  the  previous 
experiment,  all  subjects  were  required  to  have  20/20  or  corrected  20/20  vision  and  normal 
hearing.  Because  the  experiment  did  not  involve  precise  measurements  of  pixel  resolution 
or  sampling  frequency,  a  vision  and  hearing  test  were  not  needed.  Before  conducting  the 
experiment,  each  subject  was  asked,  as  part  of  a  voluntary  consent  form,  if  he  or  she  met 
the  vision  and  hearing  requirements. 

D.  APPARATUS 

The  apparatus  used  in  this  experiment  is  identical  to  that  of  the  first  two 
experiments:  Experiment  1:  Static  Resolution  and  Experiment  2:  Static  Noise.  See 
Chapter  VII,  Section  D. 

E.  PROCEDURE 

The  procedure  of  this  experiment  is  identical  to  that  of  the  previous  experiment. 
Experiment  2:  Static  Noise.  The  experiment  involved  a  3x3  factorial  within  subjects 
design.  The  two  independent  variables  are  visual  and  audio  display  quality.  The  two 
dependent  variables  are  the  corresponding  quality  perception  of  the  auditory  and  visual 
displays.  The  three  levels  of  the  visual  quality  independent  variable  consist  of  low-, 
medium-,  and  high-quality  visual  displays  of  the  fruit-flower  scene  depicted  earlier  in 
Chapter  IV,  Figure  33  having  resolutions  of  34  pixels/inch,  50  pixels/inch,  and  66  pixels/ 
inch  respectively.  Another  key  aspect  for  using  the  fruit-flower  scene  is  that  it  has  no 
alphanumerics,  hence  the  name  of  this  experiment.  In  the  previous  two  experiments,  60 
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out  of  72  subjects  (83.3%)  focused  on  alphanumerics  when  determining  the  quality  of  the 
visual  displays.  As  such,  another  goal  of  this  experiment  is  to  investigate  whether  a  lack 
of  alphanumeric  features  has  any  affect  on  the  overall  ability  of  the  subjects  to  determine 
the  quality  of  the  visual  displays.  The  three  levels  of  the  auditory  quality  independent 
variable  consist  of  low-,  medium-,  and  high-quality  auditory  displays  of  the  same  music 
selection  presented  monophonically  having  sampling  rates  of  1 1  kHz,  19  kHz,  and  35 
kHz  respectively.  As  such,  the  visual  display  parameters  manipulated  are  pixel  resolution, 
and  the  auditory  display  parameters  manipulated  are  sampling  frequency.  During  the 
experiment  which  lasts  approximately  30  minutes,  each  subject  wears  headphones  and 
sits  in  front  of  a  20-inch  computer  display  monitor.  The  task  of  the  subject  is  to  rate  the 
perceived  quality  of  auditory-only,  visual-only,  and  auditory-visual  displays  via  Likert 
rating  scales  ranging  from  1  (low)  to  7  (high). 

The  lowest  and  highest  quality  auditory  displays  in  which  the  subjects  were 
supposed  to  memorize  during  the  self-calibration  phase  corresponded  to  the  music 
selection  at  8  kHz  and  44. 1  kHz  respectively.  The  lowest  and  highest  quality  visual 
displays  in  which  the  subjects  were  supposed  to  memorize  during  the  self-calibration 
phase  are  depicted  in  Figure  86  and  Figure  87  respectively.  The  low-quality  visual 
display  has  a  resolution  of  28  pixels/inch;  whereas  the  high-quality  visual  display  has  a 
resolution  of  72  pixels/inch.  Again,  it  is  important  to  remember  that  the  original  displays 
were  depicted  in  color,  and  that  the  actual  pixel  resolution  experienced  by  the  subject  can 
only  be  viewed  on  the  actual  20  inch  computer  monitor.  However,  the  low-  and  high- 
quality  displays  depicted  in  Figure  86  and  Figure  87  are  fairly  good  representations  of  the 
quality  difference  between  the  actual  displays  used  in  the  experiment.  Besides  the 
different  auditory  and  visual  stimuli  utilized,  the  procedure  continues  exactly  as  in  the 
previous  experiment.  As  a  result,  the  same  data  collection  factors  and  data  analysis  are 
used  to  examine  the  results. 
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Figure  86.  Experiment  3:  Low-Quality  Visual  Display  Familiarization. 
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Figure  87.  Experiment  3:  High-Quality  Visual  Display  Familiarization. 
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Figure  88.  Experiment  3:  Visual-Only  Quality  Percept  Ratings. 

F.        RESULTS  AND  DISCUSSION 

As  with  the  previous  experiment,  the  overall  results  of  this  experiment  suggest 
significant  auditory-visual  cross-modal  perception  phenomena  relevant  to  VE  and 
multimedia  developers.  The  major  findings  of  this  experiment  are  now  discussed. 

1.  Validity 

As  with  the  previous  experiments,  the  first  and  most  important  consideration  is 
whether  the  quality  of  the  visual  and  auditory  displays  developed  for  this  experiment  are 
rank  ordered  by  the  subjects  according  to  their  intended  rankings.  If  this  were  not  the 
case,  the  validity  of  the  experiment  would  be  jeopardized.  However,  in  looking  at  Figure 
88,  one  can  see  that  the  overall  quality  ratings  of  the  visual  displays  are  properly  rank 
ordered  by  the  subjects  according  to  this  experiment's  intended  low-,  medium-  and  high- 
quality  rankings.  As  such,  a  lack  of  alphanumeric  features  has  no  affect  on  the  overall 
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Figure  89.  Experiment  3:  Auditory-Only  Quality  Percept  Ratings. 

ability  of  the  subjects  to  determine  the  quality  of  the  visual  displays.  Likewise,  in  looking 
at  Figure  89,  one  can  see  that  the  overall  quality  ratings  of  the  auditory  displays  are 
properly  rank  ordered  by  the  subjects  according  to  this  experiment's  intended  low-, 
medium-,  and  high-quality  rankings.  Given  that  the  data  regarding  quality  of  all  displays 
are  properly  rank  ordered,  data  analysis  with  respect  to  the  hypotheses  can  continue. 

2.  Findings 

Figure  90  represents  the  results  of  all  one  sample  sign  tests  based  on  the  first  null 
hypothesis  which  states:  the  difference  between  a)  the  visual-only  quality  rating  of  a 
combined  auditory-visual  display,  and  b)  the  baseline  rating  for  the  visual-only  quality 
display  is  zero.  As  one  can  see  from  the  results,  1 )  when  presented  a  combined  high- 
quality  visual  and  medium-quality  auditory  display,  when  only  asked  to  rate  the  quality 
of  the  visual  display,  a  statistically  significant  finding  at  the  .0201  level  suggests  that  the 
quality  perception  of  a  high-quality  visual  display  is  increased  when  coupled  with  a 
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Figure  90.  Experiment  3:  One  Sample  Sign  Tests  for  Visual-Only  Quality  Percept 

of  Combined  Auditory-Visual  Displays. 


medium-quality  auditory  display,  and  2)  when  presented  a  combined  high-quality  visual 
and  high-quality  auditory  display,  when  only  asked  to  rate  the  quality  of  the  visual 
display,  a  statistically  significant  finding  at  the  .0161  level  suggests  that  the  quality 
perception  of  a  high-quality  visual  display  is  increased  when  coupled  with  a  high-quality 
auditory  display. 

Figure  91  represents  the  results  of  all  one  sample  sign  tests  based  on  the  second 
null  hypothesis  which  states:  the  difference  between  a)  the  auditory-only  quality  rating  of 
a  combined  auditory-visual  display,  and  b)  the  baseline  rating  for  the  auditory-only 
quality  display  is  zero.  As  one  can  see  from  the  results,  there  are  no  statistically 
significant  findings  in  any  of  the  quality  combinations. 
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Figure  91.  Experiment  3:  One  Sample  Sign  Tests  for  Auditory-Only  Quality 
Percept  of  Combined  Auditory- Visual  Displays. 

Figure  92  represents  the  results  of  all  one  sample  sign  tests  based  on  the  third  null 
hypothesis  which  states:  the  difference  between  a)  the  visual  quality  rating  of  a  combined 
auditory-visual  display  when  also  rating  the  auditory  display,  and  b)  the  baseline  rating 
for  the  visual-only  quality  display  is  zero.  As  one  can  see  from  the  results,  when 
presented  a  combined  high-quality  visual  and  high-quality  auditory  display,  when  asked 
to  rate  both  auditory  and  visual  displays,  a  statistically  significant  finding  at  the  .0125 
level  suggests  that  the  quality  perception  of  a  high-quality  visual  display  is  increased 
when  coupled  with  a  high-quality  auditory  display. 

Figure  93  represents  the  results  of  all  one  sample  sign  tests  based  on  the  fourth 
null  hypothesis  which  states:  the  difference  between  a)  the  auditory  quality  rating  of  a 
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Figure  92.  Experiment  3:  One  Sample  Sign  Tests  for  Visual  Quality  Percept  When 
Also  Rating  the  Auditory  Display  of  Combined  Auditory-Visual  Displays. 


combined  auditory-visual  display  when  also  rating  the  visual  display,  and  b)  the  baseline 
rating  for  the  auditory-only  quality  display  is  zero.  The  results  suggest  that  when 
presented  a  combined  medium-quality  auditory  and  low-quality  visual  display,  when 
asked  to  rate  both  auditory  and  visual  displays,  a  statistically  significant  finding  at  the 
.0351  level  suggests  that  the  quality  perception  of  a  medium-quality  auditory  display  is 
decreased  when  coupled  with  a  low-quality  visual  display. 

In  terms  of  response  times,  Figure  94  represents  the  average  visual  quality  rating 
response  times  of  a  combined  auditory-visual  display,  when  only  asked  to  rate  the  quality 
of  the  visual  display.  Figure  95  represents  the  average  auditory  quality  rating  response 
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Figure  93.  Experiment  3:  One  Sample  Sign  Tests  for  Auditory  Quality  Percept 
When  Also  Rating  the  Visual  Display  of  Combined  Auditory-Visual  Displays. 


times  of  a  combined  auditory-visual  display,  when  only  asked  to  rate  the  quality  of  the 
auditory  display.  Figure  96  represents  the  average  combined  auditory  and  visual  quality 
rating  response  times  of  a  combined  auditory-visual  display,  when  asked  to  rate  both  the 
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Figure  94.  Experiment  3:  Visual-Only  Quality  Rating  Response  Times  of  a 
Combined  Auditory-Visual  Display. 


auditory  and  visual  displays.  In  looking  at  the  results  of  the  response  times,  one  can  see 
various  trends  based  on  a  particular  auditory-visual  quality  combination.  However, 
several  factors  limit  the  ability  to  correctly  analyze  these  temporal  results  in  any 
statistically  valid  manner.  These  factors  are  discussed  in  the  last  chapter. 

In  terms  of  the  post-experiment  questions,  Figure  97  represents  the  subject's 
opinion  on  1)  how  easy  or  difficult  it  was  to  determine  the  quality  of  the  various  displays, 
and  2)  if  less  or  more  time  was  needed  to  adequately  rate  the  various  displays.  Keeping  in 
mind  that  subjects  used  a  Likert  rating  scale  ranging  from  1  to  7  (4  being  neutral)  to  rate 
their  opinions,  the  results  indicate  that  determining  the  quality  of  both  auditory  and  visual 
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Figure  95.  Experiment  3:  Auditory-Only  Quality  Rating  Response  Times  of  a 

Combined  Auditory-Visual  Display. 

displays  of  a  combined  auditory-visual  display  proved  to  be  more  difficult  than 
determining  the  quality  of  either  auditory  or  visual  display  presented  either  alone  or  in 
combination.  Furthermore,  the  results  indicate  that  eight  seconds  was  an  adequate  amount 
of  time  to  rate  the  visual-only  and  auditory  displays,  but  that  slightly  more  than  eight 
seconds  was  desired  when  rating  the  combined  auditory-visual  displays. 

Finally,  the  remaining  questions  of  the  post-experiment  survey  reveal  that  only  9 
of  the  36  subjects  (25.0%)  felt  that  they  were  mentally  overloaded  when  having  to  rate 
both  auditory  and  visual  displays  simultaneously.  As  in  the  previous  experiment,  some 
very  interesting  observations  were  also  observed  concerning  the  descriptions  that  the 
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Figure  96.  Experiment  3:  Response  Times  of  Both  Auditory  and  Visual 
Displays  of  a  Combined  Auditory-Visual  Display. 


subjects  used  to  determine  the  quality  of  the  various  displays.  These  observations  are 
outlined  in  the  final  chapter. 

G.       SUMMARY  AND  CONCLUSIONS 

Overall  the  findings  suggest  that  whether  asked  to  specifically  attend  to  both 
auditory  and  visual  modalities,  or  asked  to  attend  to  only  one  modality,  both  similar  and 
dissimilar  cross-modal  auditory-visual  perception  phenomena  exist.  These  findings 
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Q1  Q2  Q3  Q4  Q5  Q6  Q7  Q8 

Ql  =  How  easy  or  difficult  was  is  to  determine  the  quality  of  the  visual-only  displays? 

Q2  =  How  easy  or  difficult  was  is  to  determine  the  quality  of  the  auditory-only  displays? 

Q3  =  How  easy  or  difficult  was  is  to  determine  the  visual  quality  of  the  auditory-visual  displays? 

Q4  =  How  easy  or  difficult  was  is  to  determine  the  auditory  quality  of  the  auditory-visual  displays? 

Q5  =  How  easy  or  difficult  was  to  determine  both  the  auditory  and  visual  qualities  of  the  auditory-visual  displays? 

Q6  =  Would  you  have  liked  less  or  more  time  to  view  the  visual-only  displays? 

Q7  =  Would  you  have  liked  less  or  more  time  to  hear  the  auditory-only  displays? 

Q8  =  Would  you  have  liked  less  or  more  time  to  hear-view  the  combined  auditory-visual  displays? 


Figure  97.  Experiment  3:  Post-Experiment  Questions  1-8. 


suggest  that  when  manipulating  visual  display  pixel  resolution  and  auditory  display 
sampling  frequency: 

1)  When  attending  only  to  the  visual  modality,  a  high-quality  visual  display  coupled 
with  a  medium-quality  auditory  display  causes  an  increase  in  the  perception  of  visual 
quality  relative  to  established  baseline  conditions  derived  from  visual-only  quality 
perception  evaluations. 

2)  When  attending  only  to  the  visual  modality,  or  attending  to  both  auditory  and 
visual  modalities,  a  high-quality  visual  display  coupled  with  a  high-quality  auditory 
display  causes  an  increase  in  the  perception  of  visual  quality  relative  to  established 
baseline  conditions  derived  from  visual-only  quality  perception  evaluations. 

3)  When  attending  to  both  auditory  and  visual  modalities,  a  medium-quality  auditory 
display  coupled  with  a  low-quality  visual  display  causes  a  decrease  in  the  perception  of 
auditory  quality  relative  to  established  baseline  conditions  derived  from  auditory-only 
quality  perception  evaluations. 
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Therefore,  even  though  the  auditory  and  visual  displays  were  not  perceptually 
tightly  coupled  auditory-visual  displays  as  in  the  first  two  experiment,  the  results  indicate 
that  the  effects  of  auditory-visual  cross-modal  perception  phenomena  persist.  The  next 
chapter  presents  an  overview  of  the  combined  results  from  all  three  experiments. 
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X.  SUMMARY  AND  CONCLUSIONS 

A.  INTRODUCTION 

This  chapter  represents  the  culmination  of  two  and  a  half  years  of  research  and 
development  in  support  of  evidence  concerning  auditory-visual  cross-modal  perception 
phenomena.  The  overall  results,  conclusions,  impact,  observations,  recommendations, 
future  work,  and  final  thoughts  are  presented. 

B.  OVERALL  RESULTS 

Because  all  collected  data  were  derived  from  identical  experimental  conditions 
based  on  the  same  low-,  medium-,  and  high-quality  ordering  of  the  auditory  and  visual 
stimuli,  combining  datasets  from  all  three  experiments  is  justified  in  order  to  consider 
overall  results.  As  such,  the  following  are  the  overall  results  from  combining  the  datasets 
from  all  three  experiments. 

1.  Participants 

Overall  a  total  of  108  volunteer  participants  (59  Male,  49  Female)  comprised 
from  the  students,  faculty,  staff,  and  guests  of  NPS  served  as  subjects.  The  overall 
average  age  of  the  subjects  is  36. 1  years  ranging  in  age  from  1 1  to  63  (four  female 
subjects  did  not  give  their  age).  All  subjects  were  required  to  have  20/20  or  corrected  20/ 
20  vision  and  normal  hearing.  As  such,  before  conducting  the  experiment,  each  subject 
was  asked,  as  part  of  a  voluntary  consent  form,  if  he  or  she  met  the  vision  and  hearing 
requirements. 

2.  Validity 

Again,  the  first  and  most  important  consideration  is  whether  the  overall  quality  of 
the  visual  and  auditory  displays  are  rank  ordered  by  the  subjects  according  to  their 
intended  rankings.  In  looking  at  Figure  98,  one  can  see  that  the  overall  quality  ratings  of 


171 


Cell  Line  Chart 

Error  Bars:  ±  1  Standard  Error(s) 

c                      6  " 

H 

-                  5.5  " 

°                      5  " 

a. 

W                  4.5  " 

<  c 

U               S      4  - 

<  = 

7              0)3.5- 

O             ° 

Q                      3  " 

U 

c/3 

<  2.5  " 

CQ 

z                   2 

H 

A 

/ 

/ 

/ 

_/ 

/* 

^y^ 

*^ 

<                   1.5 

V2  Only  Percept        V4  Only  Percept        V6  Only  Percept 

V2  =  Low-Quality  Visual-Only  Percept 
V4  =  Med-Quality  Visual-Only  Percept 
V6  =  High-Quality  Visual-Only  Percept 

Figure  98.  Combined  Data:  Visual-Only  Quality  Percept  Ratings. 

the  visual  displays  are  properly  rank  ordered  by  the  subjects.  Likewise,  in  looking  at 
Figure  99,  one  can  see  that  the  overall  quality  ratings  of  the  auditory  displays  are  properly 
rank  ordered  by  the  subjects.  Given  that  the  data  regarding  quality  of  all  displays  are 
properly  rank  ordered,  data  analysis  with  respect  to  the  hypotheses  can  continue. 

3.  Overall  Findings 

Figure  100  represents  the  results  of  all  one  sample  sign  tests  based  on  the  first  null 
hypothesis  which  states:  the  difference  between  a)  the  visual-only  quality  rating  of  a 
combined  auditory-visual  display,  and  b)  the  baseline  rating  for  the  visual-only  quality 
display  is  zero.  As  one  can  see  from  the  results,  1)  when  presented  a  combined  high- 
quality  visual  and  medium-quality  auditory  display,  when  only  asked  to  rate  the  quality 
of  the  visual  display,  a  statistically  significant  finding  at  the  .0124  level  suggests  that  the 
quality  perception  of  a  high-quality  visual  display  is  increased  when  coupled  with  a 
medium-quality  auditory  display,  and  2)  when  presented  a  combined  high-quality  visual 
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Figure  99.  Combined  Data:  Auditory-Only  Quality  Percept  Ratings. 

and  high-quality  auditory  display,  when  only  asked  to  rate  the  quality  of  the  visual 
display,  a  statistically  significant  finding  at  the  .0002  level  strongly  suggests  that  the 
quality  perception  of  a  high-quality  visual  display  is  increased  when  coupled  with  a  high- 
quality  auditory  display. 

Figure  101  represents  the  results  of  all  one  sample  sign  tests  based  on  the  second 
null  hypothesis  which  states:  the  difference  between  a)  the  auditory-only  quality  rating  of 
a  combined  auditory-visual  display,  and  b)  the  baseline  rating  for  the  auditory-only 
quality  display  is  zero.  As  one  can  see  from  the  results,  1)  when  presented  a  combined 
low-quality  auditory  and  medium-quality  visual  display,  when  only  asked  to  rate  the 
quality  of  the  auditory  display,  a  statistically  significant  finding  at  the  .0375  level 
suggests  that  the  quality  perception  of  a  low-quality  auditory  display  is  decreased  when 
coupled  with  a  medium-quality  visual  display,  and  2)  when  presented  a  combined  low- 
quality  auditory  and  high-quality  visual  display,  when  only  asked  to  rate  the  quality  of 
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Figure  100.  Combined  Data:  One  Sample  Sign  Tests  for  Visual-Only  Quality 
Percept  of  Combined  Auditory-Visual  Displays. 

the  auditory  display,  a  statistically  significant  finding  at  the  .0002  level  strongly  suggests 
that  the  quality  perception  of  a  low-quality  auditory  display  is  decreased  when  coupled 
with  a  high-quality  visual  display. 

Figure  102  represents  the  results  of  all  one  sample  sign  tests  based  on  the  third 
null  hypothesis  which  states:  the  difference  between  a)  the  visual  quality  rating  of  a 
combined  auditory-visual  display  when  also  rating  the  auditory  display,  and  b)  the 
baseline  rating  for  the  visual-only  quality  display  is  zero.  As  one  can  see  from  the  results, 
1)  when  presented  a  combined  high-quality  visual  and  low-quality  auditory  display,  when 
asked  to  rate  both  auditory  and  visual  displays,  a  statistically  significant  finding  at  the 
.0172  level  suggests  that  the  quality  perception  of  a  high-quality  visual  display  is 
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Figure  101.  Combined  Data:  One  Sample  Sign  Tests  for  Auditory-Only  Quality 
Percept  of  Combined  Auditory-Visual  Displays. 

increased  when  coupled  with  a  low-quality  auditory  display,  and  2)  when  presented  a 
combined  high-quality  visual  and  medium-quality  auditory  display,  when  asked  to  rate, 
both  auditory  and  visual  displays,  a  statistically  significant  finding  at  the  .0042  level 
strongly  suggests  that  the  quality  perception  of  a  high-quality  visual  display  is  increased 
when  coupled  with  a  medium-quality  auditory  display,  and  3)  when  presented  a 
combined  high-quality  visual  and  high-quality  auditory  display,  when  asked  to  rate  both 
auditory  and  visual  displays,  a  statistically  significant  finding  at  the  .0034  level  strongly 
suggests  that  the  quality  perception  of  a  high-quality  visual  display  is  increased  when 
coupled  with  a  high-quality  auditory  display. 
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Figure  102.  Combined  Data:  One  Sample  Sign  Tests  for  Visual  Quality  Percept 
When  Also  Rating  the  Auditory  Display  of  Combined  Auditory-Visual  Displays. 


Figure  103  represents  the  results  of  all  one  sample  sign  tests  based  on  the  fourth 
null  hypothesis  which  states:  the  difference  between  a)  the  auditory  quality  rating  of  a 
combined  auditory-visual  display  when  also  rating  the  visual  display,  and  b)  the  baseline 
rating  for  the  auditory-only  quality  display  is  zero.  The  results  suggest  that  there  are  no 
statistically  significant  findings  in  any  of  the  quality  combinations.  However,  it  is  worth 
mentioning  that  when  presented  a  combined  low-quality  auditory  and  high-quality  visual 
display,  when  asked  to  rate  both  auditory  and  visual  displays,  the  results  at  the  .0586 
level  suggests  that  the  quality  perception  of  a  low-quality  auditory  display  is  decreased 
when  coupled  with  a  high-quality  visual  display. 
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Figure  103.  Combined  Data:  One  Sample  Sign  Tests  for  Auditory  Quality  Percept 
When  Also  Rating  the  Visual  Display  of  Combined  Auditory-Visual  Displays. 


In  terms  of  response  times.  Figure  104  represents  the  overall  average  visual 
quality  rating  response  times  of  a  combined  auditory-visual  display,  when  only  asked  to 
rate  the  quality  of  the  visual  display.  Figure  105  represents  the  overall  average  auditory 
quality  rating  response  times  of  a  combined  auditory-visual  display,  when  only  asked  to 
rate  the  quality  of  the  auditory  display.  Figure  106  represents  the  overall  average 
combined  auditory  and  visual  quality  rating  response  times  of  a  combined  auditory-visual 
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Figure  104.  Combined  Data:  Visual-Only  Quality  Rating  Response  Times  of  a 

Combined  Auditory-Visual  Display 

display,  when  asked  to  rate  both  the  auditory  and  visual  displays.  Again,  in  looking  at 
the  overall  results  of  the  response  times,  one  can  see  various  trends,  however,  several 
factors  limit  the  ability  to  correctly  analyze  these  temporal  results  in  any  statistically 
valid  manner.  These  factors  are  discussed  in  the  OBSERVATIONS  section  below. 
In  terms  of  the  post-experiment  questions,  Figure  107  represents  the  overall 
subject's  opinion  on  1)  how  easy  or  difficult  it  was  to  determine  the  quality  of  the  various 
displays,  and  2)  if  less  or  more  time  was  needed  to  adequately  rate  the  various  displays. 
Keeping  in  mind  that  subjects  used  a  Likert  rating  scale  ranging  from  1  to  7  (4  being 
neutral)  to  rate  their  opinions,  the  overall  results  indicate  that  determining  the  quality  of 
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Figure  105.  Combined  Data:  Auditory-Only  Quality  Rating  Response  Times  of  a 

Combined  Auditory-Visual  Display. 

both  auditory  and  visual  displays  of  a  combined  auditory-visual  display  proved  to  be 
more  difficult  than  determining  the  quality  of  either  auditory  or  visual  display  presented 
either  alone  or  in  combination.  Furthermore,  the  results  indicate  that  eight  seconds  was  an 
adequate  amount  of  time  overall  to  rate  the  visual-only  and  auditory  displays,  but  that 
slightly  more  than  eight  seconds  was  desired  when  rating  the  combined  auditory-visual 
displays. 

Finally,  the  remaining  questions  of  the  post-experiment  survey  reveal  that  60  out 
of  72  subjects  (83.3%),  focused  on  alphanumerics  when  determining  the  quality  of  the 
visual  displays  (only  applicable  in  the  first  two  experiments)  and  that  36  of  the  108 
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Figure  106.  Combined  Data:  Response  Times  of  Both  Auditory  and  Visual 
Displays  of  a  Combined  Auditory-Visual  Display. 


subjects  (33.3%)  felt  that  they  were  mentally  overloaded  when  having  to  rate  both 
auditory  and  visual  displays  simultaneously. 

C.       OVERALL  CONCLUSIONS 

The  goal  of  this  research  has  been  achieved.  By  varying  the  quality  (fidelity)  of 
both  auditory  and  visual  displays,  it  has  been  possible  to  measure  auditory-visual  cross- 
modal  perception  phenomena.  The  overall  conclusions  suggest  that  1 )  whether  asked  to 
specifically  attend  to  both  auditory  and  visual  modalities  or  asked  to  attend  to  only  one 
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Q7  =  Would  you  have  liked  less  or  more  time  to  hear  the  auditory-only  displays? 

Q8  =  Would  you  have  liked  less  or  more  time  to  hear-view  the  combined  auditory-visual  displays? 


Figure  107.  Combined  Data:  Post-Experiment  Questions  1-8. 

modality,  2)  whether  manipulating  visual  display  pixel  resolution  or  Gaussian  noise  level, 

3)  whether  manipulating  auditory  display  sampling  frequency  or  Gaussian  noise  level,  or 

4)  whether  an  auditory-visual  display  is  tightly  or  loosely  coupled,  cross-modal  auditory- 
visual  perception  phenomena  exist.  Overall,  these  findings  strongly  suggest: 

1)  When  attending  only  to  the  visual  modality,  a  high-quality  visual  display 
coupled  with  either  a  medium-  or  high-quality  auditory  display  causes  an  increase  in  the 
perception  of  visual  quality  relative  to  established  baseline  conditions  derived  from 
visual-only  quality  perception  evaluations. 

2)  When  attending  only  to  the  auditory  modality,  a  low-quality  auditory  display 
coupled  with  either  a  medium-  or  high-quality  visual  display  causes  a  decrease  in  the 
perception  of  auditory  quality  relative  to  established  baseline  conditions  derived  from 
auditory-only  quality  perception  evaluations. 
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3)  When  attending  to  both  auditory  and  visual  modalities,  a  high-quality  visual 
display  coupled  with  a  low-,  medium-,  or  high-quality  auditory  display  causes  an  increase 
in  the  perception  of  visual  quality  relative  to  established  baseline  conditions  derived  from 
visual-only  quality  perception  evaluations. 

Another  finding  worth  mentioning,  which  is  just  slightly  above  the  level  of  statistical 
significance  set  for  this  research,  is  that  when  attending  to  both  auditory  and  visual 
modalities,  a  low-quality  auditory  display  coupled  with  a  high-quality  visual  display 
causes  a  decrease  in  the  perception  of  auditory  quality  relative  to  established  baseline 
conditions  derived  from  auditory-only  quality  perception  evaluations. 

Overall,  these  results  provide  the  empirical  evidence  to  support  what  most  people 
in  the  gaming  business,  multimedia  industry,  entertainment  industry,  and  VE  community 
have  suspected  all  along:  that  audio  can  influence  the  quality  perception  of  video,  and 
that  video  can  influence  the  quality  perception  of  audio.  The  results  also  indicate  that 
although  we  can  divide  our  attention  between  audition  and  vision,  we  are  not  consciously 
aware  of  potentially  significant  intermodality  effects. 

D.       IMPACT 

Because  of  the  multi-disciplinary  nature  of  this  research  effort,  the  impact  of  the 
overall  findings  are  far  reaching  having  both  theoretical  and  commercial  implications. 

1.  Theoretical  Impact 

The  theoretical  impact  of  the  findings  in  this  study  are  diverse.  The  following 
describes  the  impact  on  Sensory  Interaction,  Visual  Dominance,  Divided  Attention,  and 
Time-sharing. 

* 

a.    Sensory  Interaction 

Because  the  overall  findings  indicate  that  auditory  quality  can  influence 
visual  quality  perception  and  vice  versa,  some  sort  of  sensory  interaction  must  be  taking 
place.  These  findings  support  the  many  conclusions  outlined  earlier  in  Chapter  II,  Section 
C.  For  example,  these  findings  support  the  early  intersensory  research  conclusions  of 


182 


both  Ryan  [RYAN40]  and  Gilbert  [GILB41].  Also,  O'Connor  and  Hermelin  [OCON81] 
would  argue  that  these  findings  support  the  concept  of  sensory  capture.  But  how  this 
sensory  interaction  occurs  is  still  not  known.  Stein  and  Meredith  [STEI93]  might 
conclude  that  this  interaction  could  be  taking  place  at  the  neurological  level  based  on 
single  multi-modal  neurons  as  depicted  earlier  in  Figure  4  and  Figure  5.  However, 
Gibson  [GIBS66]  [GIBS79J  might  argue  that  this  sensory  interaction  is  based  on  the 
complexity  of  natural  life  events. 

b.  Visual  Dominance 

One  of  the  overall  findings  of  this  research  effort  suggests  that  when 
attending  only  to  the  auditory  modality,  a  low-quality  auditory  display  coupled  with 
either  a  medium-  or  high-quality  visual  display  causes  a  decrease  in  the  perception  of 
auditory  quality.  The  reason  for  degrading  the  perception  of  the  auditory  quality  might  be 
based  on  the  concept  of  visual  dominance  discussed  earlier  in  Chapter  II,  Section  E  and 
Chapter  III,  Section  F.  Perhaps  at  some  higher  cognitive  level,  the  higher-quality  visual 
display  is  being  compared  with  the  lower-quality  auditory  display.  This  unconscious 
comparison  might  cause  one  to  perceive  that  the  auditory  quality  is  worse  than  it  actually 
is  because  of  the  dominating  nature  of  the  visual  modality. 

c.  Divided  Attention 

The  overall  findings  of  this  research  indicate  that  humans  can  effectively 
divide  their  attention  between  the  auditory  and  visual  sensory  modalities.  This  ability  to 
divide  one's  attention  between  the  auditory  and  visual  sensory  modalities  supports  the 
various  attention  theories  discussed  earlier  in  Chapter  II,  Section  F. 

d.  Time-Sharing 

Although  this  research  supports  the  ability  to  divide  attention  among  the 
auditory  and  visual  sensory  modalities,  the  time-sharing  question  remains:  do  we  process 
these  simultaneous  auditory  and  visual  stimuli  in  parallel  or  in  serial?  If  the  overall 
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results  indicate  that  we  process  simultaneous  auditory  and  visual  stimuli  in  serial,  this 
would  lend  support  the  Single-Resource  Theory  discussed  earlier  in  Chapter  II,  Section  F. 
If  the  overall  results  indicate  that  we  process  simultaneous  auditory  and  visual  stimuli  in 
parallel,  this  would  lend  support  the  Multiple-Resource  Theory  discussed  earlier  in 
Chapter  II,  Section  F.  Since  33.3%  of  all  subjects  felt  that  they  were  mentally  overloaded 
when  having  to  rate  both  auditory  and  visual  displays  simultaneously,  one  might 
conclude  that  these  particular  subjects  did  not  have  adequate  time  to  simultaneously  rate 
both  auditory  and  visual  displays  in  a  serial  manner  and  therefore  had  to  process  the 
simultaneous  auditory  and  visual  displays  in  parallel,  which  was  mentally  overloading.  If 
this  were  true,  this  would  lend  support  to  the  Multiple-Resource  Theory.  However,  it  is 
important  to  note  that  in  this  research  effort,  no  assumptions  can  be  made  as  to  how  the 
subjects  processed  the  simultaneous  auditory  and  visual  stimuli.  Consequently,  no  time- 
sharing conclusions  can  be  made  from  the  overall  results  of  this  research  effort. 
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2.  Commercial  Impact 

The  commercial  impact  of  the  findings  in  this  study  are  diverse.  For  example,  one 
of  the  overall  findings  of  this  research  effort  suggests  that  when  attending  only  to  the 
visual  modality,  a  high-quality  visual  display  coupled  with  either  a  medium-  or  high- 
quality  auditory  display  causes  an  increase  in  the  overall  visual  quality  perception  of  an 
auditory-visual  display.  Thus,  suppose  the  fictitious  company,  ACME  Cyber  Art,  sells 
contemporary  paintings  via  the  internet.  ACME  Cyber  Art's  current  web-based 
advertising  only  depicts  photographs  of  the  various  paintings  from  which  prospective 
customers  can  purchase  on-line.  ACME  Cyber  Art,  however,  wants  to  increase  its  sales. 
One  possible  strategy  to  increase  sales,  is  to  simply  add  medium-  or  high-quality  music  to 
their  web  page  while  prospective  customers  are  looking  at  the  various  artworks.  As  such, 
the  perceptual  visual  quality  of  the  various  artworks  might  increase  relative  to  itself, 
thereby  possibly  increasing  the  probability  that  the  customer  will  make  a  purchase. 
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Another  finding  of  this  research  effort  suggests  that  when  attending  only  to  the 
auditory  modality,  a  low-quality  auditory  display  coupled  with  either  a  medium-  or  high- 
quality  visual  display  causes  a  decrease  in  the  overall  auditory  quality  perception  of  an 
auditory-visual  display.  Thus,  suppose  the  next  GRAMMY  Awards  were  partially 
decided  via  internet-based  votes.  As  such,  music  fans  would  point  their  web  browser  to 
the  GRAMMY  Awards  web  site  to  cast  their  votes.  This  GRAMMY  web  site  would 
contain  high-quality  visual  images  of  the  various  nominated  musical  talents.  By  clicking 
on  the  visual  image  of  a  particular  musical  talent,  one  could  hear  a  short  15  second  audio 
clip  of  the  nominated  song.  In  an  effort  to  1)  decrease  rendering  time,  2)  decrease  storage 
requirements,  and  3)  decrease  download  time,  suppose  the  GRAMMY  web  site  designers 
decreased  the  sampling  frequency  of  the  audio  clips  from  44. 1  kHz  to  10  kHz.  As  a 
result,  to  the  surprise  of  the  GRAMMY  web  site  designers,  most  fans  complained  that  the 
quality  of  the  audio  clips  was  very  poor  making  it  impossible  to  cast  their  votes  properly. 
Consequently,  the  internet-based  voting  of  the  GRAMMY  Awards  might  be  a  huge 
failure. 

Another  finding  of  this  research  effort  suggests  that  when  attending  to  both 
auditory  and  visual  modalities,  a  high-quality  visual  display  coupled  with  a  low-, 
medium-  or  high-quality  auditory  display  causes  an  increase  in  the  overall  visual  quality 
perception  of  an  auditory-visual  display.  Thus,  suppose  a  VE  developer  has  been  tasked 
to  increase  the  realism  (and  perhaps  presence)  of  a  3D  scene  depicting  a  typical  family 
living  room.  The  current  virtual  living  room  contains  a  TV  and  stereo  system  which  is 
rendered  using  high-quality  visual  graphics.  However,  the  living  room  scene  does  not 
have  any  associated  sounds.  Instead  of  increasing  the  pixel  resolution  of  the  living  room 
scene,  causing  an  unwanted  increase  in  the  visual  rendering  time  of  the  scene,  the  VE 
developer  adds  1)  high-quality  music  to  the  stereo  system,  and  2)  an  MPEG  video 
sequence  containing  high-quality  audio  to  the  TV  display.  As  a  result,  the  perceptual 
visual  quality  of  the  scene  ought  to  increase  by  simply  adding  the  associated  auditory 
displays  without  the  need  to  manipulate  any  of  the  visual  displays. 
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These  preceding  examples  highlight  just  some  of  the  numerous  possibilities 
impacted  by  this  research  effort.  Overall,  the  findings  of  this  research  effort  are  indeed 
important  which  can  greatly  benefit  the  gaming  business,  multimedia  industry, 
entertainment  industry,  VE  community,  and  also  the  Internet  industry. 

E.        OBSERVATIONS 

The  following  describes  some  of  the  overall  informal  observations  noted  during 
the  conduct  of  the  main  experiments.  No  formal  data  analyses  are  performed  on  the 
observations.  The  observations  are  presented  in  order  to  provide  the  reader  with 
additional  peripheral  insights  on  the  overall  findings  of  this  research  effort. 

1.  Response  Time  Measurement 

After  observing  130  subjects  throughout  the  course  of  the  various  experiments, 
the  use  of  the  rating  scales  to  collect  subject  responses  times  is  perhaps  invalid.  The 
reason  for  this  stems  from  the  physical  layout  of  the  rating  scales  and  the  functionality  of 
the  mouse.  Since  the  rating  scales  consist  of  one  or  two  horizontal  set(s)  of  radio  buttons, 
the  distance  between  the  Push  to  Continue  button  and  choice  number  one  is  further  than 
the  distance  between  the  Push  to  Continue  button  and  choice  number  four.  As  a  result,  it 
will  always  take  a  longer  time  to  select,  for  example,  choice  numbers  one  and  seven  as 
opposed  to  choice  number  four.  To  alleviate  this  problem,  all  response  times  need  to  be 
normalized  to  establish  a  common  time  metric  among  all  choices.  This  normalization 
process  is  achieved  through  Fitts's  Law  which  states  that  "...the  time  to  move  the  hand  to 
a  target  depends  only  on  the  relative  precision  required,  that  is,  the  ratio  between  the 
target's  distance  and  its  size"  [CARD83]  (see  [WICK92]  for  more  information  on  Fitts's 
Law).  Nevertheless,  Fitts's  Law  was  not  considered  in  this  research  effort. 

In  terms  of  the  combined  rating  scale,  some  subjects  complained  that  the  visual 
scale  should  have  been  on  the  top  whereas  others  preferred  the  current  format  with  the 
auditory  scale  on  top.  The  functionally  of  the  mouse  and  mouse  pad  also  have  an 
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undetermined  effect  on  response  time.  Some  subjects  complained  that  the  mouse  would 
occasionally  stick  or  slide  improperly,  while  others  did  not  experience  any  problems. 
Some  subjects  would  keep  their  hands  on  the  mouse  the  entire  time,  while  others  would 
place  their  hands  in  their  laps,  and  then  grab  the  mouse  when  it  was  time  to  make  a 
response.  On  a  side  note,  some  subjects  used  the  mouse/cursor  to  read  all  the  instructions 
and  also  to  point  at  salient  quality  features.  Some  subjects  would  also  slide  their  cursor  to 
the  relative  quality  position  of  the  rating  scale  even  before  the  scale  appeared. 
Furthermore,  adept  computer  users  are  much  more  efficient  at  using  the  mouse  as 
opposed  to  some  one  using  the  mouse's  point-and-click  paradigm  for  the  first  time.  Some 
subjects  who  were  accustomed  trackball  users  felt  uncomfortable  using  the  mouse. With 
all  the  preceding  observations,  the  use  of  the  rating  scales  in  all  three  experiments  to 
capture  response  time  ought  to  be  considered  invalid.  Therefore,  as  stated  earlier,  any 
statistical  analysis  of  the  results  of  the  response  times  must  keep  in  mind  the 
aforementioned  observations. 

2.  Synesthesia  Encounter 

After  discussing  the  experiment  with  one  of  the  female  subjects,  she  said  that 
sometimes  she  experienced  various  shades  of  colors  when  listening  to  classical  music. 
She  was  not  aware  of  all  the  research  that  has  been  done  concerning  synesthesia.  It  was 
very  interesting  to  discuss  synesthesia  with  someone  who  actually  experiences 
synesthesia. 

3.  Subjects  Description  and  Use  of  the  Stimuli 

Perhaps  the  most  interesting  observations  were  gathered  from  the  post-experiment 
questions  which  asked  the  subjects  if  they  focused  on  any  particular  features  when 
determining  quality,  and  if  so,  to  describe  those  features.  The  diverse  responses  are 
simply  amazing.  This  diversity  stems  from  the  various  backgrounds  of  the  subjects.  For 
example,  in  describing  a  straight-line  on  the  radio,  a  computer  graphics  programmer 
might  use  the  term  aliasing,  whereas,  the  novice  might  use  the  term  jaggedness.  Also, 
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some  subjects  felt  that  it  was  easier  to  determine  the  auditory  and  visual  qualities 
simultaneously  because  they  could  use  the  stimulus  in  one  modality  to  support  their 
quality  decision  in  the  other  modality.  The  following  is  an  excerpted  compilation  of  the 
items  focused  on  by  the  subjects  and  also  the  terms  used  to  describe  what  they  focused  on 
when  determining  visual  and  auditory  displays  quality. 

a.  Experiment  1:  Static  Resolution 

Visual  Display  Quality  Terms: 

fonts,  lines  at  edge,  patterns,  straight  lines,  text,  control  knobs,  frame 
around  frequency  window,  matrix  on  speaker  pattern,  numbers  on  frequency 
scale,  name  on  radio,  top  left  edge  of  radio,  the"  on"  and  "off"  labels,  the  word 
"hallicrafters"  on  the  radio,  outside  edges  of  radio,  lower  speaker  line,  the  lines 
going  through  the  image,  dial,  anti-aliasing,  legibility  of  characters,  the  word 
"turning,  "  the  number  "12,  "  the  upper  right-hand  portion  of  the  radio,  the 
white  dots  on  speaker  pattern,  contrast  of  radio  to  background,  pieces  of  dirt  on 
top  of  radio,  highlights,  grill,  letters,  blurring  of  letters  and  numbers,  ridges  on 
dial,  inconsistencies  of  corners  and  the  line  along  the  backside  of  the  radio,  the 
word  "continental"  on  the  radio,  reflecting  light,  white  knob. 

Auditory  Display  Quality  Terms: 

sense  of  remoteness,  cymbals,  the  cymbals  crash,  compressed  versus  open, 
frequencies,  low  sounded  muddy  and  didn  't  sustain,  treble,  guitar,  highs  versus 
lows,  opening  highs,  high  was  more  clear,  high  hat  on  drums,  frequency  range, 
dynamic  range,  the  presence  of  the  closer  sound  appeared  to  be  of  better 
quality,  low  was  muffled  and  high  was  more  treble,  the  counter  point  of  low 
frequency  organ  line,  the  keyboard  resonance  was  more  dynamic  in  the  highs 
than  in  the  lows,  high  sounded  tinny  and  low  quality  had  more  base,  base/treble, 
more  base  in  high  and  less  base  in  low,  high  was  painful  and  low  was  not 
painful,  qualities  seemed  reversed,  low  sounded  farther  back  and  high  sounded 
farther  forward,  the  first  note,  drum  sound,  low  quality  was  more  pleasing,  high 
was  more  irritating,  low  was  more  damped  than  high,  the  low  quality  sounded 
muted,  snare  drums,  low  sounded  better,  clearness  of  music,  low  had  less 
volume,  high  was  more  broad  sounding,  bass  was  high,  the  poor  music  reminded 
me  of  music  in  a  can,  the  good  music  was  a  definite  stereo  sound. 

Combined  Auditory-Visual  Display  Quality  Terms: 

It  was  hard  to  believe  that  the  older  radio  could  play  the  newer  alternative 
music,  reversal  of  auditory  and  visual  qualities. 

b.  Experiment  2:  Static  Noise 

Visual  Display  Quality  Terms: 

small  print  above  lower  right  and  left  dials,  words  under  frequency  scale, 
numbers  on  frequency  scale,  granularity  quality  of  background,  the  "on"  and 


"off"  switch,  name  of  radio,  judge  readability  of  alphanumerics,  granularity  of 
edges,  brightness  of  white  knob,  better  resolution  means  better  quality,  right 
side  of  radio,  letters  above  the  knobs,  the  word  "continental,  "  mesh  in  speaker, 
reflection  on  front  top,  darkness  of  black,  clarity  of  dial  numbers,  the  amount  of 
brownish  distortion  in  black  finish  of  radio,  contrasts  between  light  and  dark, 
glare  in  front  right  top  quadrant  of  radio,  shine  on  top,  shadows,  light 
reflection,  lower  right-hand-corner,  background  static,  sharpness  of  "on  "  "off" 
knob,  grille  holes,  outlay  of  radio,  looked  at  dots  all  over,  fuzziness  of  the  grid 
lines  on  the  speaker,  corners,  graininess  of  picture,  textures,  haze  on  top  and 
haze  on  reflection,  bottom  left  of  whole  image. 

Auditory  Display  Quality  Terms: 

piano  accompaniment  in  the  background,  general  level  of  static,  clarity  of 
bass,  clearer  is  higher  quality,  the  louder  static  was  low  quality  and  the  lower 
static  was  the  higher  quality,  differentiate  the  amount  of  static  present,  loudness 
of  static  versus  loudness  of  audio  signal,  hiss  level,  bass  tones,  the  crispness  of 
the  music,  the  frequency  pitch  of  the  static  background  noise,  amount  of  snow/ 
interference,  white  noise  level,  amount  of  feedback,  scratchiness,  the  frequency 
of  static,  level  of  noise,  percent  of  volume  taken  up  by  noise,  the  loudness  of  the 
background  rain,  treble. 

Combined  Auditory-Visual  Display  Quality  Terms: 
sometimes  reversed  auditory  and  visual  qualities. 

c.    Experiment  3:  Static  Resolution  Nonalphanumeric 

i 

Visual  Display  Quality  Terms: 

pixellation  on  lower  leaf,  outline  of  apple  and  fruit  on  the  plate,  upper  edge 
of  apple,  right  side  of  leaf  on  table,  bottom  edge  of  red  rose,  flowers,  carpet, 
texture,  shadowing,  fruit  skin,  the  roses,  peach,  pear,  looking  for  continuous 
lines,  clarity  of  black  spot  on  pear,  weave  of  cloth,  rose  petals,  smoothness  of 
apples,  the  overall  colors,  the  brighter  the  better  the  quality,  blade  of  grass  in 
lower  left  corner,  curved  edges  and  color  blends,  the  contrast  with  the  yellow 
and  red  roses,  looked  at  cleaner  images,  pink  rose  petals,  hard  edges,  the  pixels. 

Auditory  Display  Quality  Terms: 

high-end  tenor  quality,  high  frequencies,  low  quality  sounded  as  though  it 
was  played  in  a  box,  mushing  sound  for  low  quality,  more  pinging  for  high 
quality,  tone  increased  with  high  quality  sound,  low  quality  has  a  deeper  tone, 
high  was  tinny,  the  low  was  hollow  sounding,  the  high  was  sharper,  the  chimes 
sounded  muted  and  the  high  was  full  and  loud,  high  quality  had  higher  notes, 
bass  was  muffled  and  high  had  crisp  cymbals,  more  bass  means  better  quality, 
range  of  tones,  muffling  of  resonance,  equality  of  left  and  right  ears,  hissing  or 
lack  thereof  in  the  background,  low  end  fidelity  and  range  of  sound,  things  I 
could  not  express,  tonal  quality,  clearness  of  bass,  the  higher  pitched  instrument 
coming  through  clearer,  one  is  clear,  the  other  is  distant,  the  guitar  in  the  back, 
loudness  of  the  shower,  brush  strokes  for  the  cymbals,  the  peaks,  the  more  the 
instruments  the  more  the  quality. 
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Combined  Auditory-Visual  Display  Quality  Terms: 

The  bowl  of  fruit  does  not  mix  well  with  the  choice  of  music.  The  choice  of 
music  should  have  been  classical,  reversal  of  audio  and  visual  qualities, 
drumbeat  and  treble,  the  more  the  bass  the  better  the  quality, 

4.  Reversals 

A  very  common  response  from  the  subjects  was  that  they  sometimes  felt  they  may 
have  reversed  the  rating  of  auditory  and  visual  qualities.  This  auditory-visual  dyslexia 
may  be  attributed  to  some  of  the  findings  concerning  auditory-visual  cross-modal 
perception. 

5.  Recognizable  Quality  Levels 

Upon  completion  of  the  experiment,  some  subjects  were  astonished  when  they 
were  told  that  only  three  levels  of  auditory  and  visual  stimuli  were  utilized.  Their 
astonishment  is  probably  attributed  to  the  number  of  choices  on  the  rating  scales  (seven). 
Thus,  subjects  may  have  been  anticipating  seven  levels  of  quality,  and  as  a  result 
conformed  (perceptually)  to  accepting  seven  quality  levels. 

F.        RECOMMENDATIONS 

1.  Recruiting  Subjects 

The  recruiting  of  volunteer  subjects  took  much  longer  time  to  accomplish  than 
originally  planned.  One  should  anticipate  allocating  more  time  to  recruit  subjects  than  the 
total  amount  of  time  to  actually  test  subjects. 

2.  Statistical  Analysis  Package 

Because  the  statistical  analysis  software  package  was  chosen  well  in  advance  of 
collecting  data,  as  well  as  mastering  its  use,  the  data  analysis  portion  was  accomplished 
with  much  greater  ease. 
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3.  Hardware  and  Software  Platform 

Because  of  the  immense  amount  of  time  and  data  lost  due  to  hardware  and 
software  related  issues  during  the  experimental  design  phase  of  this  research  effort,  it  is 
crucial  to  insure  the  reliability  and  usability  of  all  chosen  hardware  and  software  as  early 
as  possible  in  the  design  phase. 

4.  Downloaded  Software 

The  use  of  all  the  freely  downloaded  software  used  in  this  effort  greatly  facilitated 
the  software  development  of  the  main  experiments,  since  the  experimenter  merely  has  to 
download  the  software  and  start  developing.  There  is  no  need  to  waste  time  venturing  out 
to  the  computer  software  store.  Furthermore,  since  the  software  is  free,  precious  research 
funding  can  be  used  for  other  things  such  as  hardware. 

5.  Photoshop  and  SoundForge 

This  research  would  not  have  been  possible  without  the  software  to  create  the 
various  visual  and  auditory  displays.  Adobe  Photoshop  [ADOB98]  and  Sonic  Foundary's 
SoundForge  [SONI98]  proved  to  be  outstanding  software  packages  and  their  use  is 
highly  recommended. 

6.  Visual  Dominance 

It  is  interesting  to  note,  that  because  this  dissertation  is  a  written  document,  only 
the  visual  stimuli  can  be  presented  to  the  reader  which  is  evident  by  the  numerous 
figures.  The  auditory  stimuli  can  only  be  imagined.  Thus,  the  reader  has  a  much  better 
understanding  of  the  visual  stimuli,  but  not  the  auditory  stimuli.  Is  this  not  another 
example  of  visual  dominance? 
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G.       FUTURE  WORK 

1.  Choice  of  Quality  Parameters  and  Stimuli 

Since  pixel  resolution,  Gaussian  noise  level,  and  sampling  frequency  were  the 
only  quality  parameters  manipulated,  the  use  of  other  quality  metrics  is  warranted. 
Furthermore,  the  effects  from  using  various  other  stimuli,  such  as  motion  video  and  3D 
VEs  are  also  needed.  As  such,  a  greater  scope  of  potential  auditory-visual  perception 
phenomena  can  be  investigated. 

One  possible  scenario  using  a  VE  might  first  include  the  process  of  having 
subjects  watch  a  virtual  person  (in  3D  space)  place  a  radio  (playing  music)  on  a  table. 
After  this  initial  process  of  watching  the  virtual  radio  being  placed  (dynamically)  on  the 
virtual  table,  subjects  might  perceive  a  stronger  perceptual  grouping  between  the  radio 
(visual)  and  music  (audio)  through  increased  temporal  and  spatial  synchronization, 
thereby  decreasing  the  cognitive  distance  between  the  radio  (visual)  and  music  (radio). 
As  a  result,  if  the  same  experiments  outlined  in  this  dissertation  were  then  conducted 
after  this  initial  process,  the  overall  results  might  indicate  an  increase  in  statistically 
significant  auditory-visual  cross-modal  perception  phenomena. 

2.  Auditory-Visual  Quantitative  Perceptual  Model 

Given  that  auditory-visual  cross-modal  perception  phenomena  exist,  the  next 
logical  step  is  to'incorporate  these  overall  findings  into  some  type  of  useful  auditory- 
visual  quantitative  perceptual  model  (similar  to  that  proposed  by  Hollier  and  Voelcker 
[HOLL97]  as  depicted  earlier  in  Figure  29).  This  model  can  then  be  used  to  derive 
appropriate  (quantitative)  levels  of  auditory  and  visual  fidelity  for  use  by  developers  in 
the  gaming  business,  multimedia  industry,  entertainment  industry,  VE  community,  and 
the  Internet  industry,  etc.  For  example,  given  a  certain  application,  this  auditory-visual 
quantitative  perceptual  model  could  help  to  derive  the  appropriate  levels  and  specific 
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amounts  of  visual  display  pixel  resolution  and  auditory  display  sampling  frequency  as  a 
function  of  visual-only,  auditory-only,  and/or  combined  auditory-visual  media. 

3.  Intersensory  Research 

The  exhaustive  literature  review  and  results  of  this  research  effort  make  it  clear 
that  in  order  to  better  understand  the  proper  use  of  multisensory  stimuli,  more  research 
emphasis  needs  to  be  placed  on  investigating  intersensory  phenomena.  This  increased 
emphasis  need  not  be  limited  to  auditory-visual  interactions  but  ought  to  include 
investigating  auditory-visual-haptic  interactions. 

4.  On-line  Experiments 

Because  of  the  potential  to  easily  acquire  many  (perhaps  thousands)  subjects,  the 
use  of  on-line  experiments  can  greatly  facilitate  scientific  research.  As  such,  all  the 
experiments  contained  in  this  research  effort  can  be  used  on-line.  However,  on-line 
experiments  make  it  difficult  to  control  the  conditions  of  the  experiment  (i.e.,  hardware 
specifications,  proper  subject  participation,  environmental  conditions,  etc.).  Being  able  to 
control  the  conditions  is  vital  when  conducting  experiments.  Nevertheless,  a  first  attempt 
has  been  made  towards  conducting  on-line  experiments  which  can  hopefully  be  used 
toward  future  on-line  research. 

H.       FINAL  THOUGHTS 

It  is  hoped  that  this  dissertation  will  help  to  bridge  the  current  multi-disciplinary 
gap  among  multimedia  and  VE  developers.  Furthermore,  this  dissertation  is  intended  to 
become  the  key  reference  that  researchers  need  to  read  before  attempting  to  evaluate 
multi-modal  perceptual  effects  in  combined  auditory  and  visual  displays. 
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APPENDIX  D.  INTERNET  RESOURCES 

The  first  section  of  this  appendix  contains  the  URL's  of  some  research  institutions 
which  are  currently  doing  research  in  various  aspects  of  sound.  The  second  section 
contains  the  URL's  of  various  sound  related  commercial  products. 

Auditory  Perception  Lab,  Dept.  of  Psychology,  University  of  California, 
Berkeley :  http://ear.  berkeley. e du/ auditory _lab/ 

Center  for  Computer  Research  in  Music  and  Acoustics  (CCRMA),  Dept.  of 
Music,  Stanford  University:  http://ccrma-www.stanford.edu/Welcome.html 

Center  for  Experimental  Music  and  Intermedia  (CEMI),  University  of  North 
Texas:  http://www. scs. unt. edu/cemi/cemi. htm 

Center  for  New  Music  and  Audio  Technologies  (CNMAT),  University  of 
California,  Berkeley:  http://www.cnmat.berkeley.edu/ 

Center  for  Research  in  Computing  and  the  Arts  (CRCA),  University  of 
California,  San  Diego:  http://crca-www.ucsd.edu 

Center  for  Research  in  Electronic  Art  Technology  (CREATE),  Dept.  of  Music, 
University  of  California,  Santa  Barbara:  http://www.ccmrc.ucsb.edu/ 

Center  for  Studies  in  Music  Technology  (CSMT),  Yale  University:  http:// 
www.  music. y  ale.  edu:/ 

Dipartimento  di  Ingegneria  Industrials  University  of  Parma,  Angelo  Farina: 
http.V/pcfarina.  eng.  unipr.  it/ 

Faculty  of  Music,  McGill  University,  Montreal:  http://www.music.mcgill.ca/ 

Graphics,  Visualization,  and  Usability  Center,  Georgia  Tech:  http:// 
www.  cc.  gatech.  edu/ gvu/ multimedia/ 

Harvard  Computer  Music  Center,  Harvard  University:  http://www- 
mario.harvard.edu 

Hearing  Development  Research  Laboratory  (HDRL),  Waisman  Center, 
University  of  Wisconsin:  http://www.waisman.wisc.edu/hdrl/ 


245 


Human  Interface  Technology  Lab  (HIT  LAB),  University  ofWashington:  http:// 
www.  hid.  Washington,  edu/ 

Human  Research  and  Engineering  Directorate  (HRED),  Army  Research 
Laboratory:  http://www.arl.mil/ARL-Direct0rates/HRED/f7red.html 

Image  Synthesis  Group,  Dept.  of  Computer  Science,  Trinity  College,  Dublin: 
http.V/vangogh.  cs.  ted.  ie\ 

Institut  de  Recherche  et  Coordination  Acoustique/Musique  (IRCAM),  Institute 
for  Acoustic/Music  Research:  http://www.ircam.fr 

Interval  Research  Corporation,  Palo  Alto,  California:  http://www.interval.com 

Laboratory  of  Acoustics  and  Audio  Signal  Processing,  Helsinki  University  of 
Technology  (HUT):  http://www.hut.fi/HUT/Acoustics/index.html 

Machine  Listening  Group,  MIT  Media  Lab,  Massachusetts  Institute  of 
Technology :  http: //sound. media. m it.  edu/ 

National  Center  for  Supercomputing  Applications  (NCSA),  University  of 
Illinois  at  Urbana-Champaign:  http://www.ncsa.uiuc.edu/ 

NASA  Ames  Research  Center,  Moffett  Field,  California:  http:// 
www.arc.nasa.gov/ 

NAVE  Research  Group,  Dept.  of  Computer  Science,  University  of  Colorado  at 
Boulder:  http: //www.  cs.  Colorado.  edu/~cboyd/ 

Norwegian  network  for  Technology,  Acoustics  and  Music  (NoTAM),  University 
of  Oslo:  http://www.notam.uio.no/index-e.html 

Parmly  Hearing  Institute,  Loyola  University  Chicago:  http://parmly-2.ls.luc.edu/ 
parmly/ 

Princeton  Sound  Kitchen,  Princeton  University:  http:// 
www.music.princeton.edu:  80/PSKV 

SCCP  Virtual  Reality  SOUND,  University  of  Aizu:  http://www-ci.u-aizu.ac.jp/ 
VirtualReality/WWW/sound.html 

Sound  Localization  Research,  San  Jose  University:  http://www-engr.sjsu.edu/ 
~dudaZDuda.Research.html 
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Visual  Systems  Laboratory,  University  of  Central  Florida:  hup:// 
www.  vsl.  ist.  ucf.  edu/ 

The  WORLDSONG  Project:  http://www.hyperreal.com/~mpesce/ 
worldsong.html 

York  University  Music  Technology  Group,  The  University  of  York:  http:// 
www.  vork.  ac.  uk/inst/mustech/3d  audio/ amhison.htm 


This  section  contains  the  URL's  of  various  sound  related  commercial  products. 

AdB  International  Corporation:  http://www.adbdigital.com/ 

Aureal  Semiconductor:  http://www.aureal.com 

The  Binaural  Source:  http://www.btown.com/binaural.html 

C ATT :  h ttp://www.netg.  se/~catt/ 

Chromatic  Research:  http://www.chromatic.com/ 

Circle  Surround:  http://www.surround.12et/ 

Creative  Labs:  http://www.creaf.com/ 

Crystal  River  Engineering:  http://www.cre.com/index.html 

DirectSound  Xtra:  http://www.directxtras.com/ds_home.htm 

Dolby  Laboratories:  http://www.dolby.com/ 

E-mu  Systems  Inc.:  http://www.emu.com/ 

Ensoniq  Corporation:  http://www.ensoniq.com/ 

Firsthand:  http://www.firsthand.com/ 

HeadRoom:  http://headroom.headphone.com/ 

Headspace :  http: //www.  headspace.  com 

HoonTech :  http: //www. hoontech. co. kr/hoontech_eng. html 

Lake  DSP:  http://www.lakedsp.com/ 
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Level  Control  Systems:  http://www.lcsaudio.com/lcs.htmI 

Lexicon :  http.V/www.  lexicon.com/ 

MIDI  Home  Page:  http://www.eeb.ele.tue.nl/midi/index.html 

MIDI  Manufacturers  Association:  http://www2.midi.org/mma/ 

Muscle  Fish:  http://www.musclefish.com/ 

NuReality:  http://www.nureality.com/ 

Paradigm  Simulation  Inc.:  http://www.paradigmsim.com/ 

Pyramid  Systems:  http://tmgweh.com/psi/ 

Qsound:  http://www.qsound.ca/ 

RealAudio:  http://www.real.com/ 

Reality  by  Design,  Inc.:  http://www.rbd.com/ 

Realistic  Sound  Experience  (RSX)  Technology:  http://www.intel.com/ial/rsx/ 

Roland  Sound  Space:  http://www.rolandcorp.com/products/PA/RSS-10.html 

SENSE8:  http://www.sense8.com/ 

Sound  Retrieval  System  (SRS):  http://www.srslabs.com/ 

Sony  IMAX  Theatre:  http://www.spe.sony.com/Pictures/sonytheatres/imax/ 
imaxtech.html 

Spatializer  Audio  Laboratories:  http://www.catalog.com/cgibin/var/3dstereo/ 
index.html 

Symbolic  Sound  Corporation:  http://www.SymbolicSound.com/ 

THX:  http://www.thx.com/ 

Tucker-Davis  Technologies  Inc.:  http://tdt-quikki.com/ 

Unofficial  SGI  Audio  Apps  List:  http://reality.sgi.com/employees/cook/ 
audio. apps/ 

Virtual  Audio  Imager  (VAI):  http://www.purestereo.com/brown.html 

Visual  Synthesis  Incorporated  (VSI):  http://www.vsicorp.com/ 
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